Skip to content

Conversation

@toubatbrian
Copy link
Contributor

@toubatbrian toubatbrian commented Jan 16, 2026

Summary

This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.

Changes

Telemetry Timestamp Accuracy

  • User speech timing: Calculate accurate speech start time by subtracting speechDuration from detection time, rather than recording when VAD triggered
  • Agent speech timing: Track when audio playback actually starts (first frame captured) instead of when generation begins
  • Span start times: Added startTime parameter support to tracer.startSpan() to allow backdating spans

Recording Alignment

  • recorder_io.ts: Added _lastSpeechEndTime and _lastSpeechStartTime tracking for proper audio alignment
  • Silence padding: takeBuf() now supports padSince parameter to prepend silence frames when needed
  • Recording start time: Now returns the minimum of input/output start times for accurate alignment

Event Propagation

  • Added PlaybackStartedEvent interface and EVENT_PLAYBACK_STARTED constant to io.ts
  • ParticipantAudioOutput now emits playbackStarted event when first audio frame is captured
  • generation.ts listens for playback events to resolve firstFrameFut with accurate timestamp

OTel Context Propagation

  • Added _agentTurnContext to SpeechHandle to maintain proper span hierarchy
  • Agent state updates now pass OTel context for correct parent-child relationships

Bug Fix: Duplicate Tool Calls

  • Fixed duplicate FunctionCall entries in session history by filtering toolsMessages to only add FunctionCallOutput items (since FunctionCall items are already added by onToolExecutionStarted)

Utilities

  • Added rejected property to Future class to check if a future was rejected

Files Changed

File Changes
telemetry/traces.ts Added startTime to StartSpanOptions, pass directly to OTel SDK
voice/io.ts Added PlaybackStartedEvent, EVENT_PLAYBACK_STARTED, onPlaybackStarted()
voice/room_io/_output.ts Emit playbackStarted on first frame capture
voice/generation.ts Listen for playbackStarted, resolve firstFrameFut with timestamp
voice/audio_recognition.ts Calculate accurate speech start time with speechDuration
voice/agent_session.ts Pass startTime and otelContext to state update methods
voice/agent_activity.ts Propagate timestamps, set _agentTurnContext, fix duplicate tool calls
voice/speech_handle.ts Added _agentTurnContext property
voice/recorder_io/recorder_io.ts Added speech timing tracking, silence padding, aligned recording start
utils.ts Added rejected getter to Future class

Testing

  • Verified telemetry spans now have accurate start times
  • Confirmed no duplicate function calls in Agent Insights transcript
  • All existing tests pass

Summary by CodeRabbit

  • Enhancements

    • Improved voice timing and synchronization (better speech start/end alignment and playback-position accuracy).
    • More consistent context and timing propagation across voice workflows for more reliable voice responses.
    • Smarter silence padding and recording/playback alignment to reduce glitches.
  • New Features

    • Explicit span startTime support for telemetry traces.
    • Playback-started event and first-frame timestamp propagation for precise playback indicators.
  • Other

    • Exposed rejection status for internal async operations (rejected getter).

✏️ Tip: You can customize this high-level summary in your review settings.

@changeset-bot
Copy link

changeset-bot bot commented Jan 16, 2026

🦋 Changeset detected

Latest commit: 8b6aaed

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages
Name Type
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugins-test Patch
@livekit/agents-plugin-xai Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link

coderabbitai bot commented Jan 16, 2026

Warning

Rate limit exceeded

@toubatbrian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 49 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fef7fd0 and 8b6aaed.

📒 Files selected for processing (1)
  • agents/src/voice/io.ts

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds explicit span startTime support and propagates OpenTelemetry context through voice flows; implements event-driven first-frame playback timestamps, silence-padding and timing alignment in recorder IO; tracks Future rejection state and wires playback-started events across audio outputs.

Changes

Cohort / File(s) Summary
Telemetry
agents/src/telemetry/traces.ts
Added startTime?: number to StartSpanOptions and propagated it into tracer.startSpan calls.
Futures / Utilities
agents/src/utils.ts
Added private #rejected flag and public rejected getter on Future<T>; reject() sets the flag.
OTEL Context & Speech Flow
agents/src/voice/agent_activity.ts, agents/src/voice/agent_session.ts, agents/src/voice/speech_handle.ts, agents/src/voice/audio_recognition.ts
Capture and propagate OpenTelemetry Context (_agentTurnContext), compute speech start times from VAD events, pass startTime into span creation for user_turn/user_speaking/agent_speaking, and adjust onStartOfSpeech/first-frame callback flows.
Audio Playback First-Frame Timing
agents/src/voice/io.ts, agents/src/voice/generation.ts
Added AudioOutput.EVENT_PLAYBACK_STARTED, PlaybackStartedEvent and onPlaybackStarted(createdAt) handler; changed first-frame future to Future<number> and resolve it with playback-start event timestamp.
Room & Avatar Output First-Frame Emission
agents/src/voice/room_io/_output.ts, agents/src/voice/avatar/datastream_io.ts
Track firstFrameEmitted and invoke onPlaybackStarted(Date.now()) on first emitted frame; reset flag on playout/flush for new sessions.
Recorder IO: Buffering & Silence Padding
agents/src/voice/recorder_io/recorder_io.ts
Pass last-speech-end into input buffering (takeBuf(padSince?)), pad input with silence when needed, compute recordingStartedAt from input/output, track _lastSpeechStartTime/_lastSpeechEndTime, and align playback timing/finish handling (seconds-based durations).
Minor / Examples / Lint
agents/src/voice/generation.ts, examples/src/*, .changeset/*
Adjusted imports to mix value/type imports; updated callsites for numeric first-frame futures; added ESLint directives in example CLIs; added changeset note.

Sequence Diagram

sequenceDiagram
    participant VAD as VoiceDetector
    participant AA as AgentActivity
    participant SH as SpeechHandle
    participant AS as AgentSession
    participant TR as Tracer
    participant Gen as Generation
    participant AO as AudioOutput

    VAD->>TR: startSpan("user_turn", { startTime: now - speechDuration })
    VAD->>AA: onStartOfSpeech(VADEvent)
    AA->>SH: store _agentTurnContext
    AA->>AS: _updateUserState('speaking', speechStartTime, otelContext)
    AS->>TR: startSpan("user_speaking", { startTime: speechStartTime, context: otelContext })

    AA->>Gen: start generation (carry otelContext)
    Gen->>AO: forward audio (attach listener)
    AO->>AO: first emitted frame -> emit EVENT_PLAYBACK_STARTED(createdAt)
    AO->>Gen: playbackStarted(createdAt)
    Gen->>Gen: resolve firstFrameFut with createdAt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰
I hop when spans begin on cue,
First frames chime with timestamps true,
Context tucked in every thread,
Silence padded, timing fed—
A tiny rabbit stamps "all's new!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Refine timestamps in spans and recording alignment' directly aligns with the PR's primary objectives of improving telemetry timestamp accuracy and recording alignment.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@toubatbrian toubatbrian changed the title Refine timestamps in spans and recording alignment [AGT-2450] Refine timestamps in spans and recording alignment Jan 16, 2026
@toubatbrian toubatbrian changed the title [AGT-2450] Refine timestamps in spans and recording alignment https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment Jan 16, 2026
@toubatbrian toubatbrian changed the title https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment Refine timestamps in spans and recording alignment Jan 16, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eb8d02b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@toubatbrian toubatbrian requested a review from lukasIO January 16, 2026 21:41
@toubatbrian
Copy link
Contributor Author

@codex

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f38e2c44b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@agents/src/voice/agent_activity.ts`:
- Around line 640-646: onStartOfSpeech computes speechStartTime by subtracting
VADEvent.speechDuration from Date.now() but speechDuration is in seconds while
Date.now() is milliseconds; update the subtraction in onStartOfSpeech to convert
ev.speechDuration to milliseconds (multiply by 1000) before subtracting, so the
timestamp passed to this.agentSession._updateUserState('speaking', ...) is
correct.

In `@agents/src/voice/recorder_io/recorder_io.ts`:
- Around line 693-711: captureFrame sets _startedWallTime and
_lastSpeechStartTime unconditionally while only pushing frames into accFrames
when this.recorderIO.recording is true; move the initialization of
_startedWallTime and _lastSpeechStartTime so they only occur when recording is
active (i.e., inside the same this.recorderIO.recording branch that pushes into
accFrames) to ensure timestamps align with when frames are actually recorded,
leaving the await this.nextInChain.captureFrame and await super.captureFrame
calls unchanged.
🧹 Nitpick comments (2)
agents/src/voice/agent_activity.ts (2)

1229-1231: Consider logging the actual error for debugging purposes.

The catch handler assumes the rejection is always due to cancellation, but other errors might occur. Logging the error would help with debugging unexpected failures.

♻️ Suggested improvement
       textOut.firstTextFut.await
         .then(() => onFirstFrame())
-        .catch(() => this.logger.debug('firstTextFut cancelled before first frame'));
+        .catch((e) => this.logger.debug({ error: e }, 'firstTextFut rejected before first frame'));

1686-1697: Consider extracting the duplicate filtering logic.

This filtering logic is duplicated at lines 1486-1493. While acceptable, extracting to a helper function would reduce duplication.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f38e2c and 6a77734.

📒 Files selected for processing (2)
  • agents/src/voice/agent_activity.ts
  • agents/src/voice/recorder_io/recorder_io.ts
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

  • agents/src/voice/recorder_io/recorder_io.ts
  • agents/src/voice/agent_activity.ts
🧬 Code graph analysis (1)
agents/src/voice/agent_activity.ts (2)
agents/src/vad.ts (1)
  • VADEvent (24-56)
agents/src/llm/chat_context.ts (1)
  • FunctionCallOutput (284-350)
🔇 Additional comments (12)
agents/src/voice/agent_activity.ts (6)

7-7: LGTM!

The import alias otelContext for context is clear and helps distinguish OpenTelemetry context from other context references in the codebase.


1174-1175: LGTM!

Good pattern for capturing the OTel context at task entry and propagating it through onFirstFrame to _updateAgentState. This ensures accurate span parent-child relationships across async boundaries.

Also applies to: 1220-1225


1486-1493: LGTM!

Good fix to prevent duplicate FunctionCall entries in session history. The filtering ensures only FunctionCallOutput items are added here since FunctionCall items were already added by onToolExecutionStarted.


1517-1520: LGTM!

Good naming improvement using the InS suffix to explicitly indicate the unit is seconds, addressing previous feedback about unit clarity.


1318-1319: LGTM!

Consistent application of the OTel context capture and first-frame callback patterns in _pipelineReplyTaskImpl.

Also applies to: 1419-1424, 1436-1438, 1443-1445


1765-1766: LGTM!

Consistent implementation of OTel context capture and first-frame handling in _realtimeGenerationTaskImpl.

Also applies to: 1804-1808, 1896-1903

agents/src/voice/recorder_io/recorder_io.ts (6)

125-129: LGTM!

Passing the last speech end time to takeBuf enables proper alignment between input and output recordings.


139-152: LGTM!

Correct logic for returning the minimum of input/output start times, with proper handling of undefined cases.


562-600: LGTM!

Good improvements to playback finish handling:

  • Properly handles pause state when calculating finish time
  • Clamps playback position to actual speech window
  • Tracks last speech timing for future padding decisions
  • Logs warning when speech start time is missing

603-621: LGTM!

Good adoption of the InS suffix convention for variables representing seconds. This makes the code much easier to reason about and addresses previous feedback about unit clarity.


731-735: LGTM!

Updated createSilenceFrame to use durationInS parameter name, consistent with the seconds-based naming convention used throughout the file.


680-685: LGTM!

Properly appends trailing silence to the buffer when needed, with correct ms-to-seconds conversion.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@toubatbrian toubatbrian requested a review from lukasIO January 19, 2026 22:17

export interface PlaybackFinishedEvent {
// How much of the audio was played back
/** How much of the audio was played back, in seconds */
Copy link
Contributor Author

@toubatbrian toubatbrian Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukasIO I'm going to keep the naming of playbackPositon for this PR. Otherwise, if will trigger a lot of renamings to playbackPositionInS, which I will do in a different PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants