A backend service for monitoring API uptime/health, tracking incidents, and dispatching alerts.
- Express + TypeScript API server
- Monitor management endpoints
- Incident tracking endpoints
- Alert channel endpoints
- Auth (register/login with JWT)
- Swagger docs UI
- Background workers using BullMQ + Redis for checks/aggregation/flush jobs
- User authentication with
POST /auth/registerandPOST /auth/login(JWT + cookie support) - Protected monitor APIs: create, list, read, update, delete, and monitor controls (
start,pause,resume) - Scheduled health checks using BullMQ job schedulers with retry/backoff behavior
- Async health-result persistence pipeline using Redis Streams + DB flush worker
- Incident lifecycle APIs: create, list open incidents, get by id, acknowledge, resolve, delete
- Alert channel management (CRUD + test endpoint)
- Webhook notifications for incident created, acknowledged, and resolved events
- Stats aggregation worker for active monitors on a recurring schedule
- Swagger/OpenAPI docs at
/api-docsand/docs - Development utility route:
POST /api/dev/clear-db(non-production only)
sequenceDiagram
autonumber
participant U as User/Client
participant T as Monitored API
participant API as Express API
participant DB as PostgreSQL
participant Q as BullMQ + Redis
participant HW as Health Worker
participant RS as Redis Stream
participant DW as DB Flush Worker
participant IW as Incident Service
participant AS as Alert Service
participant WH as Webhook Endpoint
participant SW as Stats Worker
U->>API: POST /auth/login
API->>DB: Validate user credentials
API-->>U: JWT token (+ cookie)
U->>API: POST /api/monitors
API->>DB: Insert monitor
API-->>U: Monitor created
U->>API: POST /api/monitors/start/:id
API->>DB: Set monitor active
API->>Q: upsert monitor scheduler
API->>Q: schedule stats aggregation
API-->>U: Started
loop Every check_interval
Q->>HW: Run monitor job
HW->>T: HTTP check to target URL
alt Check success
HW->>RS: add health result (UP)
HW->>IW: resolve open incident (if exists)
else Check fails after retries
HW->>RS: add health result (DOWN)
HW->>IW: create/increment OPEN incident
IW->>AS: incident event
AS->>WH: send webhook notification
HW->>DB: set monitor inactive
end
end
DW->>RS: read pending health results
DW->>DB: persist check results/history
Q->>SW: recurring stats-aggregation job
SW->>DB: calculate and store monitor stats
server/: main backend code (TypeScript)server/src/app.ts: API server entrypointserver/src/queues/workers/: background worker processesapi.README.md: detailed endpoint referencexx/FactoryAi_agent/: unrelated experiment/prototype assets
- Node.js 18+
- PostgreSQL database
- Redis instance (for BullMQ queues)
npm install
cd server
npm installCreate server/.env with at least:
DATABASE_URL=postgresql://user:password@host:5432/dbname
JWT_SECRET=change-me
NODE_ENV=development
LOG_LEVEL=debugNotes:
DATABASE_URLis required byserver/src/db/db_config.ts.JWT_SECRETcurrently falls back tosuper-secretif unset, but set your own value in real environments.- Redis credentials are currently hardcoded in
server/src/config/redis.ts; move them to env vars before production use.
From server/:
npm run devThis starts:
- API server (
src/app.ts) onhttp://localhost:3000 - Health-check worker
- DB flush worker
- Stats aggregator worker
Production build/start:
npm run build
npm startWhen running locally:
- Swagger UI:
http://localhost:3000/api-docs - Redirect alias:
http://localhost:3000/docs - Full written API reference:
api.README.md
POST /auth/registerPOST /auth/loginGET/POST/... /api/monitorsGET/POST/... /api/incidentsGET/POST/... /api/v1/alert-channelsGET /profile(protected)
- Remove hardcoded Redis host/password from source code.
- Replace default JWT fallback secret.
- Add
.env.examplefor easier onboarding. - Consider updating root
README.mdto point to this file andapi.README.md.