Refine timestamps in spans and recording alignment #982

toubatbrian · 2026-01-16T21:36:23Z

Summary

This PR ports the Python PR #4131 (AGT-2316) to TypeScript, refining timestamp accuracy for telemetry spans and improving recording alignment.

Changes

Telemetry Timestamp Accuracy

User speech timing: Calculate accurate speech start time by subtracting speechDuration from detection time, rather than recording when VAD triggered
Agent speech timing: Track when audio playback actually starts (first frame captured) instead of when generation begins
Span start times: Added startTime parameter support to tracer.startSpan() to allow backdating spans

Recording Alignment

recorder_io.ts: Added _lastSpeechEndTime and _lastSpeechStartTime tracking for proper audio alignment
Silence padding: takeBuf() now supports padSince parameter to prepend silence frames when needed
Recording start time: Now returns the minimum of input/output start times for accurate alignment

Event Propagation

Added PlaybackStartedEvent interface and EVENT_PLAYBACK_STARTED constant to io.ts
ParticipantAudioOutput now emits playbackStarted event when first audio frame is captured
generation.ts listens for playback events to resolve firstFrameFut with accurate timestamp

OTel Context Propagation

Added _agentTurnContext to SpeechHandle to maintain proper span hierarchy
Agent state updates now pass OTel context for correct parent-child relationships

Bug Fix: Duplicate Tool Calls

Fixed duplicate FunctionCall entries in session history by filtering toolsMessages to only add FunctionCallOutput items (since FunctionCall items are already added by onToolExecutionStarted)

Utilities

Added rejected property to Future class to check if a future was rejected

Files Changed

File	Changes
`telemetry/traces.ts`	Added `startTime` to `StartSpanOptions`, pass directly to OTel SDK
`voice/io.ts`	Added `PlaybackStartedEvent`, `EVENT_PLAYBACK_STARTED`, `onPlaybackStarted()`
`voice/room_io/_output.ts`	Emit `playbackStarted` on first frame capture
`voice/generation.ts`	Listen for `playbackStarted`, resolve `firstFrameFut` with timestamp
`voice/audio_recognition.ts`	Calculate accurate speech start time with `speechDuration`
`voice/agent_session.ts`	Pass `startTime` and `otelContext` to state update methods
`voice/agent_activity.ts`	Propagate timestamps, set `_agentTurnContext`, fix duplicate tool calls
`voice/speech_handle.ts`	Added `_agentTurnContext` property
`voice/recorder_io/recorder_io.ts`	Added speech timing tracking, silence padding, aligned recording start
`utils.ts`	Added `rejected` getter to `Future` class

Testing

Verified telemetry spans now have accurate start times
Confirmed no duplicate function calls in Agent Insights transcript
All existing tests pass

Summary by CodeRabbit

Enhancements
- Improved voice timing and synchronization (better speech start/end alignment and playback-position accuracy).
- More consistent context and timing propagation across voice workflows for more reliable voice responses.
- Smarter silence padding and recording/playback alignment to reduce glitches.
New Features
- Explicit span startTime support for telemetry traces.
- Playback-started event and first-frame timestamp propagation for precise playback indicators.
Other
- Exposed rejection status for internal async operations (rejected getter).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

changeset-bot · 2026-01-16T21:36:27Z

🦋 Changeset detected

Latest commit: 8b6aaed

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 17 packages

Name	Type
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugins-test	Patch
@livekit/agents-plugin-xai	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2026-01-16T21:36:35Z

Warning

Rate limit exceeded

@toubatbrian has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 49 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fef7fd0 and 8b6aaed.

📒 Files selected for processing (1)

agents/src/voice/io.ts

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds explicit span startTime support and propagates OpenTelemetry context through voice flows; implements event-driven first-frame playback timestamps, silence-padding and timing alignment in recorder IO; tracks Future rejection state and wires playback-started events across audio outputs.

Changes

Cohort / File(s)	Summary
Telemetry `agents/src/telemetry/traces.ts`	Added `startTime?: number` to `StartSpanOptions` and propagated it into tracer.startSpan calls.
Futures / Utilities `agents/src/utils.ts`	Added private `#rejected` flag and public `rejected` getter on `Future<T>`; `reject()` sets the flag.
OTEL Context & Speech Flow `agents/src/voice/agent_activity.ts`, `agents/src/voice/agent_session.ts`, `agents/src/voice/speech_handle.ts`, `agents/src/voice/audio_recognition.ts`	Capture and propagate OpenTelemetry Context (`_agentTurnContext`), compute speech start times from VAD events, pass `startTime` into span creation for `user_turn`/`user_speaking`/`agent_speaking`, and adjust onStartOfSpeech/first-frame callback flows.
Audio Playback First-Frame Timing `agents/src/voice/io.ts`, `agents/src/voice/generation.ts`	Added `AudioOutput.EVENT_PLAYBACK_STARTED`, `PlaybackStartedEvent` and `onPlaybackStarted(createdAt)` handler; changed first-frame future to `Future<number>` and resolve it with playback-start event timestamp.
Room & Avatar Output First-Frame Emission `agents/src/voice/room_io/_output.ts`, `agents/src/voice/avatar/datastream_io.ts`	Track `firstFrameEmitted` and invoke `onPlaybackStarted(Date.now())` on first emitted frame; reset flag on playout/flush for new sessions.
Recorder IO: Buffering & Silence Padding `agents/src/voice/recorder_io/recorder_io.ts`	Pass last-speech-end into input buffering (`takeBuf(padSince?)`), pad input with silence when needed, compute `recordingStartedAt` from input/output, track `_lastSpeechStartTime`/`_lastSpeechEndTime`, and align playback timing/finish handling (seconds-based durations).
Minor / Examples / Lint `agents/src/voice/generation.ts`, `examples/src/`, `.changeset/`	Adjusted imports to mix value/type imports; updated callsites for numeric first-frame futures; added ESLint directives in example CLIs; added changeset note.

Sequence Diagram

sequenceDiagram
    participant VAD as VoiceDetector
    participant AA as AgentActivity
    participant SH as SpeechHandle
    participant AS as AgentSession
    participant TR as Tracer
    participant Gen as Generation
    participant AO as AudioOutput

    VAD->>TR: startSpan("user_turn", { startTime: now - speechDuration })
    VAD->>AA: onStartOfSpeech(VADEvent)
    AA->>SH: store _agentTurnContext
    AA->>AS: _updateUserState('speaking', speechStartTime, otelContext)
    AS->>TR: startSpan("user_speaking", { startTime: speechStartTime, context: otelContext })

    AA->>Gen: start generation (carry otelContext)
    Gen->>AO: forward audio (attach listener)
    AO->>AO: first emitted frame -> emit EVENT_PLAYBACK_STARTED(createdAt)
    AO->>Gen: playbackStarted(createdAt)
    Gen->>Gen: resolve firstFrameFut with createdAt

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰
I hop when spans begin on cue,
First frames chime with timestamps true,
Context tucked in every thread,
Silence padded, timing fed—
A tiny rabbit stamps "all's new!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Refine timestamps in spans and recording alignment' directly aligns with the PR's primary objectives of improving telemetry timestamp accuracy and recording alignment.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eb8d02b56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

agents/src/voice/generation.ts

toubatbrian · 2026-01-16T21:44:34Z

@codex

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f38e2c44b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

agents/src/telemetry/traces.ts

agents/src/voice/recorder_io/recorder_io.ts

agents/src/voice/agent_activity.ts

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@agents/src/voice/agent_activity.ts`:
- Around line 640-646: onStartOfSpeech computes speechStartTime by subtracting
VADEvent.speechDuration from Date.now() but speechDuration is in seconds while
Date.now() is milliseconds; update the subtraction in onStartOfSpeech to convert
ev.speechDuration to milliseconds (multiply by 1000) before subtracting, so the
timestamp passed to this.agentSession._updateUserState('speaking', ...) is
correct.

In `@agents/src/voice/recorder_io/recorder_io.ts`:
- Around line 693-711: captureFrame sets _startedWallTime and
_lastSpeechStartTime unconditionally while only pushing frames into accFrames
when this.recorderIO.recording is true; move the initialization of
_startedWallTime and _lastSpeechStartTime so they only occur when recording is
active (i.e., inside the same this.recorderIO.recording branch that pushes into
accFrames) to ensure timestamps align with when frames are actually recorded,
leaving the await this.nextInChain.captureFrame and await super.captureFrame
calls unchanged.

🧹 Nitpick comments (2)

agents/src/voice/agent_activity.ts (2)
1229-1231: Consider logging the actual error for debugging purposes.

The catch handler assumes the rejection is always due to cancellation, but other errors might occur. Logging the error would help with debugging unexpected failures.
♻️ Suggested improvement
       textOut.firstTextFut.await
         .then(() => onFirstFrame())
-        .catch(() => this.logger.debug('firstTextFut cancelled before first frame'));
+        .catch((e) => this.logger.debug({ error: e }, 'firstTextFut rejected before first frame'));
1686-1697: Consider extracting the duplicate filtering logic.

This filtering logic is duplicated at lines 1486-1493. While acceptable, extracting to a helper function would reduce duplication.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f38e2c and 6a77734.

📒 Files selected for processing (2)

agents/src/voice/agent_activity.ts
agents/src/voice/recorder_io/recorder_io.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

agents/src/voice/recorder_io/recorder_io.ts
agents/src/voice/agent_activity.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

agents/src/voice/recorder_io/recorder_io.ts
agents/src/voice/agent_activity.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

agents/src/voice/recorder_io/recorder_io.ts
agents/src/voice/agent_activity.ts

🧬 Code graph analysis (1)

agents/src/voice/agent_activity.ts (2)

agents/src/vad.ts (1)

VADEvent (24-56)

agents/src/llm/chat_context.ts (1)

FunctionCallOutput (284-350)

🔇 Additional comments (12)

agents/src/voice/agent_activity.ts (6)

7-7: LGTM!

The import alias otelContext for context is clear and helps distinguish OpenTelemetry context from other context references in the codebase.

1174-1175: LGTM!

Good pattern for capturing the OTel context at task entry and propagating it through onFirstFrame to _updateAgentState. This ensures accurate span parent-child relationships across async boundaries.

Also applies to: 1220-1225

1486-1493: LGTM!

Good fix to prevent duplicate FunctionCall entries in session history. The filtering ensures only FunctionCallOutput items are added here since FunctionCall items were already added by onToolExecutionStarted.

1517-1520: LGTM!

Good naming improvement using the InS suffix to explicitly indicate the unit is seconds, addressing previous feedback about unit clarity.

1318-1319: LGTM!

Consistent application of the OTel context capture and first-frame callback patterns in _pipelineReplyTaskImpl.

Also applies to: 1419-1424, 1436-1438, 1443-1445

1765-1766: LGTM!

Consistent implementation of OTel context capture and first-frame handling in _realtimeGenerationTaskImpl.

Also applies to: 1804-1808, 1896-1903

agents/src/voice/recorder_io/recorder_io.ts (6)

125-129: LGTM!

Passing the last speech end time to takeBuf enables proper alignment between input and output recordings.

139-152: LGTM!

Correct logic for returning the minimum of input/output start times, with proper handling of undefined cases.

562-600: LGTM!

Good improvements to playback finish handling:

Properly handles pause state when calculating finish time

Clamps playback position to actual speech window

Tracks last speech timing for future padding decisions

Logs warning when speech start time is missing

603-621: LGTM!

Good adoption of the InS suffix convention for variables representing seconds. This makes the code much easier to reason about and addresses previous feedback about unit clarity.

731-735: LGTM!

Updated createSilenceFrame to use durationInS parameter name, consistent with the seconds-based naming convention used throughout the file.

680-685: LGTM!

Properly appends trailing silence to the buffer when needed, with correct ms-to-seconds conversion.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

agents/src/voice/agent_activity.ts

agents/src/voice/recorder_io/recorder_io.ts

toubatbrian · 2026-01-19T22:43:09Z

agents/src/voice/io.ts


 export interface PlaybackFinishedEvent {
-  // How much of the audio was played back
+  /** How much of the audio was played back, in seconds */


@lukasIO I'm going to keep the naming of playbackPositon for this PR. Otherwise, if will trigger a lot of renamings to playbackPositionInS, which I will do in a different PR.

save

2eb8d02

toubatbrian changed the title ~~Refine timestamps in spans and recording alignment~~ [AGT-2450] Refine timestamps in spans and recording alignment Jan 16, 2026

toubatbrian changed the title ~~[AGT-2450] Refine timestamps in spans and recording alignment~~ https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment Jan 16, 2026

toubatbrian changed the title ~~https://linear.app/livekit/issue/AGT-2450/refine-timestamps-in-spans-and-recording-alignment~~ Refine timestamps in spans and recording alignment Jan 16, 2026

chatgpt-codex-connector bot reviewed Jan 16, 2026

View reviewed changes

agents/src/voice/generation.ts Show resolved Hide resolved

Create lazy-spies-worry.md

324d4dc

toubatbrian requested a review from lukasIO January 16, 2026 21:41

Update datastream_io.ts

8f38e2c

chatgpt-codex-connector bot reviewed Jan 16, 2026

View reviewed changes

agents/src/telemetry/traces.ts Show resolved Hide resolved

lukasIO reviewed Jan 19, 2026

View reviewed changes

toubatbrian added 2 commits January 19, 2026 14:09

fix review comments

c8c7ae5

Update recorder_io.ts

6a77734

coderabbitai bot reviewed Jan 19, 2026

View reviewed changes

agents/src/voice/agent_activity.ts Show resolved Hide resolved

agents/src/voice/recorder_io/recorder_io.ts Show resolved Hide resolved

Update recorder_io.ts

fc79680

toubatbrian requested a review from lukasIO January 19, 2026 22:17

toubatbrian added 5 commits January 19, 2026 14:22

fix lint

bd20934

Merge branch 'main' into brian/refine-ts-recording

6293994

Update traces.ts

c77fbae

fix lint

fef7fd0

Update io.ts

8b6aaed

toubatbrian commented Jan 19, 2026

View reviewed changes

Refine timestamps in spans and recording alignment #982

Are you sure you want to change the base?

Refine timestamps in spans and recording alignment #982

Uh oh!

Conversation

toubatbrian commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Telemetry Timestamp Accuracy

Recording Alignment

Event Propagation

OTel Context Propagation

Bug Fix: Duplicate Tool Calls

Utilities

Files Changed

Testing

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

toubatbrian commented Jan 16, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

toubatbrian Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

toubatbrian commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Jan 16, 2026 •

edited

Loading

coderabbitai bot commented Jan 16, 2026 •

edited

Loading

toubatbrian Jan 19, 2026 •

edited

Loading