Skip to content

Add DeepSeek agent tool cassette regressions#2009

Merged
gold-silver-copper merged 1 commit into
mainfrom
test/deepseek-cassette-regression
Jul 4, 2026
Merged

Add DeepSeek agent tool cassette regressions#2009
gold-silver-copper merged 1 commit into
mainfrom
test/deepseek-cassette-regression

Conversation

@gold-silver-copper

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new cassette-backed DeepSeek long-session regression suite under tests/providers/deepseek/agent_tool_sessions.rs with 10 live-recorded fixtures.
  • Covers sequential and parallel multi-tool agent loops, streaming/non-streaming parity, raw streamed tool-call aggregation, long caller-owned history with tool-result continuation, tool choice modes, reasoning metadata/deltas/usage, chat-vs-reasoner aliases, and JSON object response format.
  • Updates the existing required zero-arg streaming-tool cassette because DeepSeek now sends the supported tool_choice: "required" wire value when thinking is explicitly disabled.

Bugs found and fixed

  • DeepSeek-specific tool choice serialization: Rig was reusing OpenRouter's tool-choice shape, which serialized required/specific choices as null for DeepSeek. DeepSeek chat completions expects OpenAI-compatible chat wire format such as "required", "none", or { "type": "function", "function": { "name": "..." } }.
    • Fix: add DeepSeek chat-completions-specific tool-choice conversion.
    • Quirk preserved: DeepSeek thinking mode rejects required/specific tool choice (Thinking mode does not support this tool_choice), so Rig only sends required/specific choices when callers explicitly disable thinking via {"thinking":{"type":"disabled"}}; otherwise it preserves the previous safe null behavior used by extractor cassettes.
  • Provider response/usage preservation: DeepSeek raw responses now preserve top-level id, model, object, and system_fingerprint, and public usage now includes completion_tokens_details.reasoning_tokens.

New scenarios

  • sequential_complex_tool_calls_nonstreaming: four ordered tools with empty args, nested JSON, arrays, escaping/backslashes/newlines/unicode.
  • sequential_complex_tool_calls_streaming: streaming parity for the same four-tool loop.
  • parallel_tool_calls_single_turn_nonstreaming: two zero-arg tools in a single assistant turn.
  • parallel_tool_calls_single_turn_streaming: streaming parity for parallel tool calls.
  • raw_stream_complex_tool_call_deltas_have_object_arguments: streamed fragmented tool args are reassembled as JSON objects.
  • long_history_replay_with_tool_result_continuation: replays user/assistant/tool-call/tool-result history and forces no new tools.
  • tool_choice_required_specific_and_none: validates required, specific-tool, and no-tools behavior.
  • reasoning_enabled_preserves_reasoning_content_deltas_and_usage: validates reasoning_content, streaming reasoning deltas before text, and reasoning-token accounting.
  • chat_alias_vs_reasoner_alias_behavior: locks down deepseek-chat non-reasoning vs deepseek-reasoner reasoning behavior.
  • json_object_response_format_roundtrip: validates DeepSeek JSON object mode through response_format.

Models used

  • deepseek-v4-flash: primary stable/inexpensive model for tool calling, streaming, JSON mode, and thinking-enabled reasoning.
  • deepseek-chat: isolated alias behavior coverage for non-thinking chat mode.
  • deepseek-reasoner: isolated alias behavior coverage for reasoning mode.

Inspiration references used

  • inspirations/pydantic-ai/tests/providers/test_deepseek.py
  • inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/providers/deepseek.py
  • inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/profiles/deepseek.py
  • inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-chat-language-model.ts
  • inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-chat-language-model.test.ts
  • inspirations/vercel-ai-sdk/packages/deepseek/src/chat/deepseek-prepare-tools.ts
  • inspirations/vercel-ai-sdk/packages/deepseek/src/chat/convert-to-deepseek-chat-messages.ts
  • inspirations/vercel-ai-sdk/packages/deepseek/src/chat/convert-to-deepseek-usage.ts
  • inspirations/langchain/libs/partners/deepseek/langchain_deepseek/chat_models.py
  • inspirations/langchain/libs/partners/deepseek/tests/unit_tests/test_chat_models.py
  • inspirations/langchain/libs/partners/deepseek/tests/integration_tests/test_chat_models.py

Recording / replay commands run

Record:

RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek agent_tool_sessions -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek json_object_response_format_roundtrip -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test deepseek raw_stream_emits_required_zero_arg_tool_call -- --nocapture --test-threads=1

Replay without credentials:

env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek agent_tool_sessions -- --nocapture --test-threads=1
env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek deepseek:: -- --nocapture --test-threads=1
env -u DEEPSEEK_API_KEY cargo test -p rig --all-features --test deepseek cassette_safety -- --nocapture --test-threads=1

Other validation:

cargo fmt --check
cargo test -p rig-core providers::deepseek -- --nocapture
cargo clippy --all-targets --all-features
cargo test

Manual cassette inspection notes

  • Reviewed new fixtures under tests/cassettes/deepseek/agent_tool_sessions/ and the updated streaming_tools/raw_stream_emits_required_zero_arg_tool_call.yaml.
  • Confirmed expected /chat/completions request bodies for thinking, tool_choice, parallel_tool_calls, response_format, tool calls/results, stream options, and alias models.
  • Confirmed responses preserve expected finish_reason, model, id, system_fingerprint, reasoning_content, completion_tokens_details.reasoning_tokens, prompt cache details, and streamed tool-call deltas.
  • Ran cassette safety and an additional grep for authorization/API-key/cookie/bearer markers; no secrets or credential headers were present.
  • Existing cassette churn is limited to the DeepSeek required-tool-choice cassette affected by the bug fix.

Known gaps / non-goals

  • No multimodal tests: Rig’s DeepSeek provider exposes completion/model-listing only, not image/audio/multimodal capabilities.
  • No server-side/native DeepSeek tools: Rig currently exercises local tool calls over chat completions.
  • No exact prose assertions; tests assert structural behavior, tool names/args/order, deterministic tool output markers, JSON structure, reasoning presence/order, and usage metadata.
  • Broad cargo test gets through the new DeepSeek coverage but still fails on an unrelated existing OpenAI cassette replay mismatch in openai::cassette::permission_control::permission_control_streaming_example.

@gold-silver-copper gold-silver-copper added this pull request to the merge queue Jul 4, 2026
Merged via the queue into main with commit e80a486 Jul 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant