feat: full-duplex voice, faster MiniMax model, README rewrite#24
Open
tangxiya-star wants to merge 1 commit into
Open
feat: full-duplex voice, faster MiniMax model, README rewrite#24tangxiya-star wants to merge 1 commit into
tangxiya-star wants to merge 1 commit into
Conversation
Agent voice loop: - Default to MiniMax-M2.1-highspeed at reasoning_effort "low" (snappiest measured combo: ~200 think tokens / ~6s vs ~280 / ~8s on M2.7). - Add extra_body reasoning_split so the model's <think> chain-of-thought can never leak into the spoken reply. - Make proactive announcements interruptible (allow_interruptions=True) so the contractor can talk over a stacked-up announcement. iOS: - Keep the mic open full-duplex for the whole session instead of closing it while the agent speaks (half-duplex left the user inaudible). Echo is handled by .voiceChat AEC + the agent's interruption-word gating. Repo: - Rewrite README into the full project overview (loop, architecture, stack). - Track agent/livekit.toml (LiveKit Cloud agent config). - Remove empty vision/ placeholder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Three independent improvements, all cleanly rebased on top of latest
main.Agent voice loop (
agent/src/agent.py)MiniMax-M2.1-highspeedatreasoning_effort: "low"— measured snappiest combo for voice TTFA (~200 think tokens / ~6s per verdict vs ~280 / ~8s on M2.7-highspeed).reasoning_split: trueso the model's<think>chain-of-thought goes to a separate field and can never leak into the spoken reply (we'd seen fragments like "The user…" escape the SDK's tag stripping and get spoken aloud).allow_interruptions=True) — blocking interruptions left the contractor unable to get a word in while announcements stacked up.iOS full-duplex mic (
app-ios/.../VoiceAgentSession.swift).voiceChatAVAudioSession AEC plus the agent's interruption-word gating. Pairs with the interruptible announcements above.Repo
agent/livekit.toml(LiveKit Cloud agent config).vision/placeholder.Notes
origin/main; a stale localevents.py/test_events.pyraw-number experiment was dropped in favor of main's newer word-spelling reading (which already lists every span and matches the agent prompt).DEVELOPMENT_TEAM,.xiyabundle id) were intentionally excluded.Test
pytest: 58 pass. One live-API behavioral test (test_calls_building_code_tool_for_code_question) is flaky — it passed 3/3 on re-run; the model occasionally asks "what city?" instead of calling the tool.🤖 Generated with Claude Code