Add DeepSeek agent tool cassette regressions by gold-silver-copper · Pull Request #2009 · 0xPlaygrounds/rig

gold-silver-copper · 2026-07-04T05:14:56Z

Summary

Adds a new cassette-backed DeepSeek long-session regression suite under tests/providers/deepseek/agent_tool_sessions.rs with 10 live-recorded fixtures.
Covers sequential and parallel multi-tool agent loops, streaming/non-streaming parity, raw streamed tool-call aggregation, long caller-owned history with tool-result continuation, tool choice modes, reasoning metadata/deltas/usage, chat-vs-reasoner aliases, and JSON object response format.
Updates the existing required zero-arg streaming-tool cassette because DeepSeek now sends the supported tool_choice: "required" wire value when thinking is explicitly disabled.

Bugs found and fixed

DeepSeek-specific tool choice serialization: Rig was reusing OpenRouter's tool-choice shape, which serialized required/specific choices as null for DeepSeek. DeepSeek chat completions expects OpenAI-compatible chat wire format such as "required", "none", or { "type": "function", "function": { "name": "..." } }.
- Fix: add DeepSeek chat-completions-specific tool-choice conversion.
- Quirk preserved: DeepSeek thinking mode rejects required/specific tool choice (Thinking mode does not support this tool_choice), so Rig only sends required/specific choices when callers explicitly disable thinking via {"thinking":{"type":"disabled"}}; otherwise it preserves the previous safe null behavior used by extractor cassettes.
Provider response/usage preservation: DeepSeek raw responses now preserve top-level id, model, object, and system_fingerprint, and public usage now includes completion_tokens_details.reasoning_tokens.

New scenarios

sequential_complex_tool_calls_nonstreaming: four ordered tools with empty args, nested JSON, arrays, escaping/backslashes/newlines/unicode.
sequential_complex_tool_calls_streaming: streaming parity for the same four-tool loop.
parallel_tool_calls_single_turn_nonstreaming: two zero-arg tools in a single assistant turn.
parallel_tool_calls_single_turn_streaming: streaming parity for parallel tool calls.
raw_stream_complex_tool_call_deltas_have_object_arguments: streamed fragmented tool args are reassembled as JSON objects.
long_history_replay_with_tool_result_continuation: replays user/assistant/tool-call/tool-result history and forces no new tools.
tool_choice_required_specific_and_none: validates required, specific-tool, and no-tools behavior.
reasoning_enabled_preserves_reasoning_content_deltas_and_usage: validates reasoning_content, streaming reasoning deltas before text, and reasoning-token accounting.
chat_alias_vs_reasoner_alias_behavior: locks down deepseek-chat non-reasoning vs deepseek-reasoner reasoning behavior.
json_object_response_format_roundtrip: validates DeepSeek JSON object mode through response_format.

Models used

deepseek-v4-flash: primary stable/inexpensive model for tool calling, streaming, JSON mode, and thinking-enabled reasoning.
deepseek-chat: isolated alias behavior coverage for non-thinking chat mode.
deepseek-reasoner: isolated alias behavior coverage for reasoning mode.

Inspiration references used

inspirations/pydantic-ai/tests/providers/test_deepseek.py
inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/providers/deepseek.py
inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/profiles/deepseek.py
inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-chat-language-model.ts
inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-chat-language-model.test.ts
inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-prepare-tools.ts
inspirations/vercel-ai-sdk/packages/deepseek/src/chat/convert-to-deepseek-chat-messages.ts
inspirations/vercel-ai-sdk/packages/deepseek/src/chat/convert-to-deepseek-usage.ts
inspirations/langchain/libs/partners/deepseek/langchain_deepseek/chat_models.py
inspirations/langchain/libs/partners/deepseek/tests/unit_tests/test_chat_models.py
inspirations/langchain/libs/partners/deepseek/tests/integration_tests/test_chat_models.py

Recording / replay commands run

Record:

RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek agent_tool_sessions -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek json_object_response_format_roundtrip -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek raw_stream_emits_required_zero_arg_tool_call -- --nocapture --test-threads=1

Replay without credentials:

env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek agent_tool_sessions -- --nocapture --test-threads=1
env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek deepseek:: -- --nocapture --test-threads=1
env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek cassette_safety -- --nocapture --test-threads=1

Other validation:

cargo fmt --check
cargo test -p rig-core providers::deepseek -- --nocapture
cargo clippy --all-targets --all-features
cargo test

Manual cassette inspection notes

Reviewed new fixtures under tests/cassettes/deepseek/agent_tool_sessions/ and the updated streaming_tools/raw_stream_emits_required_zero_arg_tool_call.yaml.
Confirmed expected /chat/completions request bodies for thinking, tool_choice, parallel_tool_calls, response_format, tool calls/results, stream options, and alias models.
Confirmed responses preserve expected finish_reason, model, id, system_fingerprint, reasoning_content, completion_tokens_details.reasoning_tokens, prompt cache details, and streamed tool-call deltas.
Ran cassette safety and an additional grep for authorization/API-key/cookie/bearer markers; no secrets or credential headers were present.
Existing cassette churn is limited to the DeepSeek required-tool-choice cassette affected by the bug fix.

Known gaps / non-goals

No multimodal tests: Rig’s DeepSeek provider exposes completion/model-listing only, not image/audio/multimodal capabilities.
No server-side/native DeepSeek tools: Rig currently exercises local tool calls over chat completions.
No exact prose assertions; tests assert structural behavior, tool names/args/order, deterministic tool output markers, JSON structure, reasoning presence/order, and usage metadata.
Broad cargo test gets through the new DeepSeek coverage but still fails on an unrelated existing OpenAI cassette replay mismatch in openai::cassette::permission_control::permission_control_streaming_example.

Add DeepSeek agent tool cassette regressions

6b7b0d8

gold-silver-copper added this pull request to the merge queue Jul 4, 2026

Merged via the queue into main with commit e80a486 Jul 4, 2026
6 checks passed

github-actions Bot mentioned this pull request Jul 4, 2026

chore: release v0.40.0 #1923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DeepSeek agent tool cassette regressions#2009

Add DeepSeek agent tool cassette regressions#2009
gold-silver-copper merged 1 commit into
mainfrom
test/deepseek-cassette-regression

gold-silver-copper commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gold-silver-copper commented Jul 4, 2026

Summary

Bugs found and fixed

New scenarios

Models used

Inspiration references used

Recording / replay commands run

Manual cassette inspection notes

Known gaps / non-goals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant