Skip to content

feat(alerts): Bridge alerting to PagerDuty / Slack / Discord [ENG-361]#391

Open
islandbitcoin wants to merge 3 commits into
tmp/bridge-rebase-pr-readyfrom
eng-361-bridge-alerts
Open

feat(alerts): Bridge alerting to PagerDuty / Slack / Discord [ENG-361]#391
islandbitcoin wants to merge 3 commits into
tmp/bridge-rebase-pr-readyfrom
eng-361-bridge-alerts

Conversation

@islandbitcoin
Copy link
Copy Markdown
Contributor

ENG-361 — Wire Bridge alerts to PagerDuty / Slack (+ Discord)

Adds an AlertService (src/services/alerts) and wires it into the Bridge failure points so ops gets paged/informed when Bridge integration signals fail.

Routing

  • critical → page (PagerDuty) + inform (Slack/Mattermost, Discord)
  • warning → inform only

Best-effort + fire-and-forget — a failing or unconfigured destination never blocks or fails the webhook/request path. A destination with no configured credential is silently skipped.

Alert sources wired

Source Severity Location
ERPNext audit-write failure (deposit + transfer completed/failed) critical bridge/webhook-server/routes/{deposit,transfer}.ts
Bridge webhook processing exception critical same routes (catch)
Bridge API outage — 5xx / timeout / network critical bridge/client.ts
IBEX error on a Bridge↔IBEX movement warning deferredon-receive.ts is general LN/onchain receive handling, not the Bridge↔IBEX path; needs the exact call site

4xx from Bridge are normal rejections and are not alerted.

⚙️ Setup — env values (required for alerts to actually deliver)

The wiring ships dormant; each destination activates only when its env var is set. If none are set, alerting is a no-op (no errors, no delivery).

Env var Destination Value
ALERT_PAGERDUTY_ROUTING_KEY PagerDuty Events API v2 integration/routing key
ALERT_SLACK_WEBHOOK_URL Slack or Mattermost Incoming-webhook URL
ALERT_DISCORD_WEBHOOK_URL Discord Channel webhook URL

How to obtain each:

  • PagerDuty: Service → Integrations → Add integration → Events API v2 → copy the Integration Key.
  • Slack: App → Incoming Webhooks → Activate → Add New Webhook to Workspace → pick channel → copy URL. (Mattermost works too — same { text } payload.)
  • Discord: Channel → Edit Channel → Integrations → Webhooks → New Webhook → Copy Webhook URL.

Where to set:

  • Local dev → .env (and .env.ci for CI).
  • Staging / prod → deployment env vars / secrets, alongside MATTERMOST_WEBHOOK_URL. Treat all three as secrets.

Full guide: docs/bridge-integration/ALERTING.md.

Verifying in staging (acceptance)

  1. Set ALERT_PAGERDUTY_ROUTING_KEY + ALERT_SLACK_WEBHOOK_URL in staging.
  2. Simulate a Bridge webhook failure (force an ERPNext audit-write error, or replay a malformed transfer webhook).
  3. Confirm on-call pages via PagerDuty and a Slack message posts within ~1 min.

Notes

  • Diff is purely additive (+213 / -0 across the source files; new AlertService + 3 config lines + wiring). Touched files type-check; new files lint clean. (The branch has pre-existing eslint/prettier debt in the files I touched — not introduced here.)
  • Chatwoot was evaluated and dropped (support tool, not a paging channel).

🤖 Generated with Claude Code

bobodread876 and others added 3 commits June 5, 2026 18:18
…361]

Severity-routed best-effort alert fan-out: critical pages PagerDuty + informs
Slack/Mattermost + Discord; warning informs only. Each sender no-ops when its
env credential is unset. Config: ALERT_PAGERDUTY_ROUTING_KEY / ALERT_SLACK_WEBHOOK_URL
/ ALERT_DISCORD_WEBHOOK_URL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fire-and-forget alertBridge() at the Bridge failure points (alongside existing
logging), all critical/page:
- ERPNext audit-write failures (deposit + transfer completed/failed)
- Bridge webhook processing exceptions (deposit + transfer catch)
- Bridge API outage in client.request(): 5xx / timeout / network (4xx not alerted)

IBEX-error source deferred: on-receive.ts is general LN/onchain receive handling,
not the Bridge<->IBEX movement path; needs the exact call site (warning sev).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented Jun 6, 2026

ENG-361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants