Skip to content

Feat/slack waituntil diagnostics#2023

Closed
amikofalvy wants to merge 2 commits intomainfrom
feat/slack-waituntil-diagnostics
Closed

Feat/slack waituntil diagnostics#2023
amikofalvy wants to merge 2 commits intomainfrom
feat/slack-waituntil-diagnostics

Conversation

@amikofalvy
Copy link
Collaborator

No description provided.

amikofalvy and others added 2 commits February 15, 2026 12:26
…nd Trigger handlers

Add instrumentation to diagnose whether Vercel function instances are
being suspended between dispatch and background execution. Logs
`dispatchDelayMs` to measure the gap between when work is queued via
waitUntil and when the async handler actually starts executing. Warns
when waitUntil is unavailable (fire-and-forget) and when delays exceed
5 seconds, indicating possible instance suspension.

Co-authored-by: Cursor <cursoragent@cursor.com>
…solution

The `getWaitUntil()` utility in agents-core dynamically imports
`@vercel/functions`, but the package was only declared as a dependency
in agents-api. With pnpm's strict dependency isolation, agents-core
could not resolve it at runtime, causing `ERR_MODULE_NOT_FOUND` and
making all waitUntil-based background work (Slack mentions, webhook
triggers) silently fall back to untracked fire-and-forget execution.

Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel
Copy link

vercel bot commented Feb 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 16, 2026 0:20am
agents-docs Ready Ready Preview, Comment Feb 16, 2026 0:20am
agents-manage-ui Ready Ready Preview, Comment Feb 16, 2026 0:20am

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 16, 2026

⚠️ No Changeset found

Latest commit: dc7061c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(5) Total Issues | Risk: Low

🟡 Minor (5) 🟡

🟡 1) multi-file Hardcoded timing thresholds without named constants

files:

  • agents-api/src/domains/run/services/TriggerService.ts:635
  • packages/agents-work-apps/src/slack/services/events/app-mention.ts:82,93,141,307

Issue: The PR introduces magic numbers for timing thresholds — 5000ms for dispatch delay warnings (suspension detection) and 3000ms for slow operation warnings — without extracting them to named constants or documenting why these values were chosen.

Why: Magic numbers in alerting thresholds are a common source of noisy alerts or missed detections. Without documentation, future maintainers won't know if 5 seconds was chosen based on Vercel's serverless characteristics, observed P99 latencies, or another rationale. This makes tuning for different deployment contexts difficult.

Fix: Extract thresholds to named constants with brief documentation:

// In a shared constants file or at the top of each file:
/** Threshold for warning about possible serverless instance suspension.
 *  5s chosen based on typical Vercel cold start being <2s; delays beyond
 *  this suggest the instance was suspended mid-execution. */
const DISPATCH_DELAY_WARNING_THRESHOLD_MS = 5000;

/** Threshold for warning about slow I/O operations. */
const SLOW_OPERATION_THRESHOLD_MS = 3000;

Refs:

🟡 2) package.json Version specifier differs from agents-api

Issue: The new @vercel/functions dependency uses ^1.6.0 but agents-api/package.json specifies ^1.4.0 for the same package.

Why: While pnpm's lockfile ensures consistent resolution, having different version specifiers across packages can cause confusion during maintenance and version bumps.

Fix: Align the version specifier with agents-api (using ^1.4.0) or update both to use the same version floor.

Refs:

Inline Comments:

  • 🟡 Minor: knip.config.ts:8 Stale knip config comment about @vercel/functions
  • 🟡 Minor: app-mention.ts:75 Span attribute uses inconsistent naming convention
  • 🟡 Minor: @vercel__functions/index.d.ts:1-6 Type declaration comment is now stale

💭 Consider (2) 💭

💭 1) events.ts Missing waitUntil warning for non-app_mention handlers

Issue: The PR adds a warning log when waitUntil is unavailable for app_mention events, but other background work handlers in the same file (modal interactions, shortcuts, etc.) don't have the same warning pattern.

Why: Creates inconsistent diagnostic coverage — some fire-and-forget scenarios will be logged while others silently proceed untracked.

Fix: Either apply the same warning pattern to all waitUntil usages for consistency, or document that app_mention is intentionally the only instrumented path (perhaps it's the most critical).

💭 2) events.ts:230 Log timing differs from TriggerService pattern

Issue: The new success log fires after calling waitUntil(), but TriggerService.ts logs before calling it. Either approach is valid, but this creates a subtle inconsistency.

Fix: Consider aligning with the TriggerService pattern (log before call) for consistency, or keep as-is if the "registered" phrasing intentionally indicates post-registration.


💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured observability enhancement that adds valuable diagnostic instrumentation for detecting Vercel serverless instance suspension. The core implementation is sound — tracking dispatch-to-execution delays and logging slow operations will help debug production issues. The suggestions above are minor cleanup items (stale comments, magic numbers, naming conventions) that would improve maintainability but don't block the functional value of these changes. Nice work adding this debugging capability! 🎉

Discarded (7)
Location Issue Reason Discarded
package.json @vercel/functions as prod dependency increases bundle Invalid — this is the fix; the previous dynamic-import-only pattern caused module resolution failures in pnpm's strict isolation
app-mention.ts 3000ms thresholds may cause noise on cold starts Speculative — the thresholds seem reasonable and this is the intended diagnostic purpose
pnpm-lock.yaml Lockfile changes Proportional to the dependency addition, no issues
package.json License concern Apache-2.0 is compatible, no action needed
.changeset/ Missing changeset Changeset already exists covering this change
Multi-file Timing field naming pattern (dispatchDelayMs vs durationMs) Informational — the new descriptive naming is appropriate for diagnostic logs with multiple timings
Multi-file Consistent warning message pattern Positive observation — no action needed
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-sre 5 1 0 0 1 0 3
pr-review-devops 7 1 0 0 2 0 4
pr-review-consistency 5 0 2 0 0 0 3
pr-review-standards 0 0 0 0 0 0 0
Total 17 2 2 0 3 0 10

span.setAttribute(SLACK_SPAN_KEYS.IS_IN_THREAD, Boolean(threadTs && threadTs !== messageTs));
if (threadTs) span.setAttribute(SLACK_SPAN_KEYS.THREAD_TS, threadTs);
if (messageTs) span.setAttribute(SLACK_SPAN_KEYS.MESSAGE_TS, messageTs);
if (dispatchDelayMs !== undefined) span.setAttribute('dispatch_delay_ms', dispatchDelayMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Span attribute uses inconsistent naming convention

Issue: The new span attribute 'dispatch_delay_ms' uses snake_case as a string literal, but the established pattern in this file uses the SLACK_SPAN_KEYS constants with dot-notation (e.g., SLACK_SPAN_KEYS.TEAM_ID'slack.team_id').

Why: Inconsistent span attribute naming makes observability governance harder — queries and dashboards may miss this attribute if they expect the slack.* namespace. Using constants also prevents typos and enables IDE autocomplete.

Fix: Add the attribute to SLACK_SPAN_KEYS in tracer.ts and use the constant:

// In tracer.ts, add to SLACK_SPAN_KEYS:
DISPATCH_DELAY_MS: 'slack.dispatch_delay_ms',

// Then here:
if (dispatchDelayMs !== undefined) span.setAttribute(SLACK_SPAN_KEYS.DISPATCH_DELAY_MS, dispatchDelayMs);

Refs:

@github-actions github-actions bot deleted a comment from claude bot Feb 16, 2026
@amikofalvy
Copy link
Collaborator Author

not relevant anymore

@amikofalvy amikofalvy closed this Feb 18, 2026
@amikofalvy amikofalvy deleted the feat/slack-waituntil-diagnostics branch February 18, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant