feat: full-duplex voice, faster MiniMax model, README rewrite by tangxiya-star · Pull Request #24 · ericwang520/GreenTag

tangxiya-star · 2026-06-07T09:51:36Z

What

Three independent improvements, all cleanly rebased on top of latest main.

Agent voice loop (`agent/src/agent.py`)

Default model → MiniMax-M2.1-highspeed at reasoning_effort: "low" — measured snappiest combo for voice TTFA (~200 think tokens / ~6s per verdict vs ~280 / ~8s on M2.7-highspeed).
reasoning_split: true so the model's <think> chain-of-thought goes to a separate field and can never leak into the spoken reply (we'd seen fragments like "The user…" escape the SDK's tag stripping and get spoken aloud).
Interruptible announcements (allow_interruptions=True) — blocking interruptions left the contractor unable to get a word in while announcements stacked up.

iOS full-duplex mic (`app-ios/.../VoiceAgentSession.swift`)

Keep the mic open for the whole session instead of closing it whenever the agent speaks. The old half-duplex rule made the user inaudible for most of the conversation. Echo is handled by the .voiceChat AVAudioSession AEC plus the agent's interruption-word gating. Pairs with the interruptible announcements above.

Repo

README rewritten into the full project overview (demo loop, architecture diagram, stack, what's working).
Track agent/livekit.toml (LiveKit Cloud agent config).
Remove the empty vision/ placeholder.

Notes

Rebased onto origin/main; a stale local events.py/test_events.py raw-number experiment was dropped in favor of main's newer word-spelling reading (which already lists every span and matches the agent prompt).
Personal Xcode code-signing changes (DEVELOPMENT_TEAM, .xiya bundle id) were intentionally excluded.

Test

pytest: 58 pass. One live-API behavioral test (test_calls_building_code_tool_for_code_question) is flaky — it passed 3/3 on re-run; the model occasionally asks "what city?" instead of calling the tool.

🤖 Generated with Claude Code

Agent voice loop: - Default to MiniMax-M2.1-highspeed at reasoning_effort "low" (snappiest measured combo: ~200 think tokens / ~6s vs ~280 / ~8s on M2.7). - Add extra_body reasoning_split so the model's <think> chain-of-thought can never leak into the spoken reply. - Make proactive announcements interruptible (allow_interruptions=True) so the contractor can talk over a stacked-up announcement. iOS: - Keep the mic open full-duplex for the whole session instead of closing it while the agent speaks (half-duplex left the user inaudible). Echo is handled by .voiceChat AEC + the agent's interruption-word gating. Repo: - Rewrite README into the full project overview (loop, architecture, stack). - Track agent/livekit.toml (LiveKit Cloud agent config). - Remove empty vision/ placeholder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: full-duplex voice, faster MiniMax model, README rewrite#24

feat: full-duplex voice, faster MiniMax model, README rewrite#24
tangxiya-star wants to merge 1 commit into
mainfrom
feat/fullduplex-voice-readme

tangxiya-star commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tangxiya-star commented Jun 7, 2026

What

Agent voice loop (agent/src/agent.py)

iOS full-duplex mic (app-ios/.../VoiceAgentSession.swift)

Repo

Notes

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Agent voice loop (`agent/src/agent.py`)

iOS full-duplex mic (`app-ios/.../VoiceAgentSession.swift`)