-
Notifications
You must be signed in to change notification settings - Fork 212
Refine timestamps in spans and recording alignment #982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: 8b6aaed The changes in this PR will be included in the next version bump. This PR includes changesets to release 17 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds explicit span startTime support and propagates OpenTelemetry context through voice flows; implements event-driven first-frame playback timestamps, silence-padding and timing alignment in recorder IO; tracks Future rejection state and wires playback-started events across audio outputs. Changes
Sequence DiagramsequenceDiagram
participant VAD as VoiceDetector
participant AA as AgentActivity
participant SH as SpeechHandle
participant AS as AgentSession
participant TR as Tracer
participant Gen as Generation
participant AO as AudioOutput
VAD->>TR: startSpan("user_turn", { startTime: now - speechDuration })
VAD->>AA: onStartOfSpeech(VADEvent)
AA->>SH: store _agentTurnContext
AA->>AS: _updateUserState('speaking', speechStartTime, otelContext)
AS->>TR: startSpan("user_speaking", { startTime: speechStartTime, context: otelContext })
AA->>Gen: start generation (carry otelContext)
Gen->>AO: forward audio (attach listener)
AO->>AO: first emitted frame -> emit EVENT_PLAYBACK_STARTED(createdAt)
AO->>Gen: playbackStarted(createdAt)
Gen->>Gen: resolve firstFrameFut with createdAt
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eb8d02b56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f38e2c44b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@agents/src/voice/agent_activity.ts`:
- Around line 640-646: onStartOfSpeech computes speechStartTime by subtracting
VADEvent.speechDuration from Date.now() but speechDuration is in seconds while
Date.now() is milliseconds; update the subtraction in onStartOfSpeech to convert
ev.speechDuration to milliseconds (multiply by 1000) before subtracting, so the
timestamp passed to this.agentSession._updateUserState('speaking', ...) is
correct.
In `@agents/src/voice/recorder_io/recorder_io.ts`:
- Around line 693-711: captureFrame sets _startedWallTime and
_lastSpeechStartTime unconditionally while only pushing frames into accFrames
when this.recorderIO.recording is true; move the initialization of
_startedWallTime and _lastSpeechStartTime so they only occur when recording is
active (i.e., inside the same this.recorderIO.recording branch that pushes into
accFrames) to ensure timestamps align with when frames are actually recorded,
leaving the await this.nextInChain.captureFrame and await super.captureFrame
calls unchanged.
🧹 Nitpick comments (2)
agents/src/voice/agent_activity.ts (2)
1229-1231: Consider logging the actual error for debugging purposes.The catch handler assumes the rejection is always due to cancellation, but other errors might occur. Logging the error would help with debugging unexpected failures.
♻️ Suggested improvement
textOut.firstTextFut.await .then(() => onFirstFrame()) - .catch(() => this.logger.debug('firstTextFut cancelled before first frame')); + .catch((e) => this.logger.debug({ error: e }, 'firstTextFut rejected before first frame'));
1686-1697: Consider extracting the duplicate filtering logic.This filtering logic is duplicated at lines 1486-1493. While acceptable, extracting to a helper function would reduce duplication.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
agents/src/voice/agent_activity.tsagents/src/voice/recorder_io/recorder_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
When testing inference LLM, always use full model names from
agents/src/inference/models.ts(e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)
📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)
Initialize logger before using any LLM functionality with
initializeLogger({ pretty: true })from '@livekit/agents'
Files:
agents/src/voice/recorder_io/recorder_io.tsagents/src/voice/agent_activity.ts
🧬 Code graph analysis (1)
agents/src/voice/agent_activity.ts (2)
agents/src/vad.ts (1)
VADEvent(24-56)agents/src/llm/chat_context.ts (1)
FunctionCallOutput(284-350)
🔇 Additional comments (12)
agents/src/voice/agent_activity.ts (6)
7-7: LGTM!The import alias
otelContextforcontextis clear and helps distinguish OpenTelemetry context from other context references in the codebase.
1174-1175: LGTM!Good pattern for capturing the OTel context at task entry and propagating it through
onFirstFrameto_updateAgentState. This ensures accurate span parent-child relationships across async boundaries.Also applies to: 1220-1225
1486-1493: LGTM!Good fix to prevent duplicate
FunctionCallentries in session history. The filtering ensures onlyFunctionCallOutputitems are added here sinceFunctionCallitems were already added byonToolExecutionStarted.
1517-1520: LGTM!Good naming improvement using the
InSsuffix to explicitly indicate the unit is seconds, addressing previous feedback about unit clarity.
1318-1319: LGTM!Consistent application of the OTel context capture and first-frame callback patterns in
_pipelineReplyTaskImpl.Also applies to: 1419-1424, 1436-1438, 1443-1445
1765-1766: LGTM!Consistent implementation of OTel context capture and first-frame handling in
_realtimeGenerationTaskImpl.Also applies to: 1804-1808, 1896-1903
agents/src/voice/recorder_io/recorder_io.ts (6)
125-129: LGTM!Passing the last speech end time to
takeBufenables proper alignment between input and output recordings.
139-152: LGTM!Correct logic for returning the minimum of input/output start times, with proper handling of undefined cases.
562-600: LGTM!Good improvements to playback finish handling:
- Properly handles pause state when calculating finish time
- Clamps playback position to actual speech window
- Tracks last speech timing for future padding decisions
- Logs warning when speech start time is missing
603-621: LGTM!Good adoption of the
InSsuffix convention for variables representing seconds. This makes the code much easier to reason about and addresses previous feedback about unit clarity.
731-735: LGTM!Updated
createSilenceFrameto usedurationInSparameter name, consistent with the seconds-based naming convention used throughout the file.
680-685: LGTM!Properly appends trailing silence to the buffer when needed, with correct ms-to-seconds conversion.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
|
||
| export interface PlaybackFinishedEvent { | ||
| // How much of the audio was played back | ||
| /** How much of the audio was played back, in seconds */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukasIO I'm going to keep the naming of playbackPositon for this PR. Otherwise, if will trigger a lot of renamings to playbackPositionInS, which I will do in a different PR.
Summary
This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.
Changes
Telemetry Timestamp Accuracy
speechDurationfrom detection time, rather than recording when VAD triggeredstartTimeparameter support totracer.startSpan()to allow backdating spansRecording Alignment
recorder_io.ts: Added_lastSpeechEndTimeand_lastSpeechStartTimetracking for proper audio alignmenttakeBuf()now supportspadSinceparameter to prepend silence frames when neededEvent Propagation
PlaybackStartedEventinterface andEVENT_PLAYBACK_STARTEDconstant toio.tsParticipantAudioOutputnow emitsplaybackStartedevent when first audio frame is capturedgeneration.tslistens for playback events to resolvefirstFrameFutwith accurate timestampOTel Context Propagation
_agentTurnContexttoSpeechHandleto maintain proper span hierarchyBug Fix: Duplicate Tool Calls
FunctionCallentries in session history by filteringtoolsMessagesto only addFunctionCallOutputitems (sinceFunctionCallitems are already added byonToolExecutionStarted)Utilities
rejectedproperty toFutureclass to check if a future was rejectedFiles Changed
telemetry/traces.tsstartTimetoStartSpanOptions, pass directly to OTel SDKvoice/io.tsPlaybackStartedEvent,EVENT_PLAYBACK_STARTED,onPlaybackStarted()voice/room_io/_output.tsplaybackStartedon first frame capturevoice/generation.tsplaybackStarted, resolvefirstFrameFutwith timestampvoice/audio_recognition.tsspeechDurationvoice/agent_session.tsstartTimeandotelContextto state update methodsvoice/agent_activity.ts_agentTurnContext, fix duplicate tool callsvoice/speech_handle.ts_agentTurnContextpropertyvoice/recorder_io/recorder_io.tsutils.tsrejectedgetter toFutureclassTesting
Summary by CodeRabbit
Enhancements
New Features
Other
✏️ Tip: You can customize this high-level summary in your review settings.