Feat/slack waituntil diagnostics by amikofalvy · Pull Request #2023 · inkeep/agents

amikofalvy · 2026-02-16T00:20:25Z

No description provided.

…nd Trigger handlers Add instrumentation to diagnose whether Vercel function instances are being suspended between dispatch and background execution. Logs `dispatchDelayMs` to measure the gap between when work is queued via waitUntil and when the async handler actually starts executing. Warns when waitUntil is unavailable (fire-and-forget) and when delays exceed 5 seconds, indicating possible instance suspension. Co-authored-by: Cursor <cursoragent@cursor.com>

…solution The `getWaitUntil()` utility in agents-core dynamically imports `@vercel/functions`, but the package was only declared as a dependency in agents-api. With pnpm's strict dependency isolation, agents-core could not resolve it at runtime, causing `ERR_MODULE_NOT_FOUND` and making all waitUntil-based background work (Slack mentions, webhook triggers) silently fall back to untracked fire-and-forget execution. Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-02-16T00:20:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Feb 16, 2026 0:20am
agents-docs	Ready	Preview, Comment	Feb 16, 2026 0:20am
agents-manage-ui	Ready	Preview, Comment	Feb 16, 2026 0:20am

changeset-bot · 2026-02-16T00:20:30Z

⚠️ No Changeset found

Latest commit: dc7061c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

claude

PR Review Summary

(5) Total Issues | Risk: Low

🟡 Minor (5) 🟡

🟡 1) multi-file Hardcoded timing thresholds without named constants

files:

agents-api/src/domains/run/services/TriggerService.ts:635
packages/agents-work-apps/src/slack/services/events/app-mention.ts:82,93,141,307

Issue: The PR introduces magic numbers for timing thresholds — 5000ms for dispatch delay warnings (suspension detection) and 3000ms for slow operation warnings — without extracting them to named constants or documenting why these values were chosen.

Why: Magic numbers in alerting thresholds are a common source of noisy alerts or missed detections. Without documentation, future maintainers won't know if 5 seconds was chosen based on Vercel's serverless characteristics, observed P99 latencies, or another rationale. This makes tuning for different deployment contexts difficult.

Fix: Extract thresholds to named constants with brief documentation:

// In a shared constants file or at the top of each file:
/** Threshold for warning about possible serverless instance suspension.
 *  5s chosen based on typical Vercel cold start being <2s; delays beyond
 *  this suggest the instance was suspended mid-execution. */
const DISPATCH_DELAY_WARNING_THRESHOLD_MS = 5000;

/** Threshold for warning about slow I/O operations. */
const SLOW_OPERATION_THRESHOLD_MS = 3000;

Refs:

🟡 2) package.json Version specifier differs from agents-api

Issue: The new @vercel/functions dependency uses ^1.6.0 but agents-api/package.json specifies ^1.4.0 for the same package.

Why: While pnpm's lockfile ensures consistent resolution, having different version specifiers across packages can cause confusion during maintenance and version bumps.

Fix: Align the version specifier with agents-api (using ^1.4.0) or update both to use the same version floor.

Refs:

Inline Comments:

🟡 Minor: knip.config.ts:8 Stale knip config comment about @vercel/functions
🟡 Minor: app-mention.ts:75 Span attribute uses inconsistent naming convention
🟡 Minor: @vercel__functions/index.d.ts:1-6 Type declaration comment is now stale

💭 Consider (2) 💭

💭 1) events.ts Missing waitUntil warning for non-app_mention handlers

Issue: The PR adds a warning log when waitUntil is unavailable for app_mention events, but other background work handlers in the same file (modal interactions, shortcuts, etc.) don't have the same warning pattern.

Why: Creates inconsistent diagnostic coverage — some fire-and-forget scenarios will be logged while others silently proceed untracked.

Fix: Either apply the same warning pattern to all waitUntil usages for consistency, or document that app_mention is intentionally the only instrumented path (perhaps it's the most critical).

💭 2) events.ts:230 Log timing differs from TriggerService pattern

Issue: The new success log fires after calling waitUntil(), but TriggerService.ts logs before calling it. Either approach is valid, but this creates a subtle inconsistency.

Fix: Consider aligning with the TriggerService pattern (log before call) for consistency, or keep as-is if the "registered" phrasing intentionally indicates post-registration.

💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured observability enhancement that adds valuable diagnostic instrumentation for detecting Vercel serverless instance suspension. The core implementation is sound — tracking dispatch-to-execution delays and logging slow operations will help debug production issues. The suggestions above are minor cleanup items (stale comments, magic numbers, naming conventions) that would improve maintainability but don't block the functional value of these changes. Nice work adding this debugging capability! 🎉

Discarded (7)

Location	Issue	Reason Discarded
`package.json`	@vercel/functions as prod dependency increases bundle	Invalid — this is the fix; the previous dynamic-import-only pattern caused module resolution failures in pnpm's strict isolation
`app-mention.ts`	3000ms thresholds may cause noise on cold starts	Speculative — the thresholds seem reasonable and this is the intended diagnostic purpose
`pnpm-lock.yaml`	Lockfile changes	Proportional to the dependency addition, no issues
`package.json`	License concern	Apache-2.0 is compatible, no action needed
`.changeset/`	Missing changeset	Changeset already exists covering this change
Multi-file	Timing field naming pattern (`dispatchDelayMs` vs `durationMs`)	Informational — the new descriptive naming is appropriate for diagnostic logs with multiple timings
Multi-file	Consistent warning message pattern	Positive observation — no action needed

Reviewers (4)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-sre`	5	1	0	1	3
`pr-review-devops`	7	1	0	2	4
`pr-review-consistency`	5	0	2	0	3
`pr-review-standards`	0	0	0	0	0
Total	17	2	2	3	10

claude · 2026-02-16T00:25:50Z

packages/agents-work-apps/src/slack/services/events/app-mention.ts

    span.setAttribute(SLACK_SPAN_KEYS.IS_IN_THREAD, Boolean(threadTs && threadTs !== messageTs));
    if (threadTs) span.setAttribute(SLACK_SPAN_KEYS.THREAD_TS, threadTs);
    if (messageTs) span.setAttribute(SLACK_SPAN_KEYS.MESSAGE_TS, messageTs);
+    if (dispatchDelayMs !== undefined) span.setAttribute('dispatch_delay_ms', dispatchDelayMs);


🟡 Minor: Span attribute uses inconsistent naming convention

Issue: The new span attribute 'dispatch_delay_ms' uses snake_case as a string literal, but the established pattern in this file uses the SLACK_SPAN_KEYS constants with dot-notation (e.g., SLACK_SPAN_KEYS.TEAM_ID → 'slack.team_id').

Why: Inconsistent span attribute naming makes observability governance harder — queries and dashboards may miss this attribute if they expect the slack.* namespace. Using constants also prevents typos and enables IDE autocomplete.

Fix: Add the attribute to SLACK_SPAN_KEYS in tracer.ts and use the constant:

// In tracer.ts, add to SLACK_SPAN_KEYS: DISPATCH_DELAY_MS: 'slack.dispatch_delay_ms', // Then here: if (dispatchDelayMs !== undefined) span.setAttribute(SLACK_SPAN_KEYS.DISPATCH_DELAY_MS, dispatchDelayMs);

Refs:

packages/agents-work-apps/src/slack/tracer.ts:21-41 — existing SLACK_SPAN_KEYS pattern

amikofalvy · 2026-02-18T23:32:02Z

not relevant anymore

amikofalvy and others added 2 commits February 15, 2026 12:26

claude bot reviewed Feb 16, 2026

View reviewed changes

github-actions bot deleted a comment from claude bot Feb 16, 2026

amikofalvy closed this Feb 18, 2026

amikofalvy deleted the feat/slack-waituntil-diagnostics branch February 18, 2026 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/slack waituntil diagnostics#2023

Feat/slack waituntil diagnostics#2023
amikofalvy wants to merge 2 commits intomainfrom
feat/slack-waituntil-diagnostics

amikofalvy commented Feb 16, 2026

Uh oh!

vercel bot commented Feb 16, 2026

Uh oh!

changeset-bot bot commented Feb 16, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Feb 16, 2026

Uh oh!

amikofalvy commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amikofalvy commented Feb 16, 2026

Uh oh!

vercel bot commented Feb 16, 2026

Uh oh!

changeset-bot bot commented Feb 16, 2026

⚠️ No Changeset found

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟡 Minor (5) 🟡

💭 Consider (2) 💭

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

amikofalvy commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant