Skip to content

Add Mistral agent tool cassette regressions#2010

Merged
gold-silver-copper merged 3 commits into
mainfrom
test/mistral-cassette-regression
Jul 4, 2026
Merged

Add Mistral agent tool cassette regressions#2010
gold-silver-copper merged 3 commits into
mainfrom
test/mistral-cassette-regression

Conversation

@gold-silver-copper

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new cassette-backed Mistral long-session regression suite under tests/providers/mistral/agent_tool_sessions.rs with 9 live-recorded fixtures.
  • Covers sequential and parallel multi-tool agent loops, real SSE streaming/non-streaming parity, streamed tool-call aggregation, long caller-owned chat history with assistant tool calls + tool results, tool choice modes, JSON object mode, JSON-schema structured output, and usage/metadata preservation.
  • Registers Mistral with the shared cassette harness and cassette safety checks.

Bugs found and fixed

  • Mistral streaming was buffered/non-streaming: CompletionModel::stream called the normal completion path and converted the final response into stream items. This PR implements real Mistral SSE chat-completions streaming using the shared OpenAI chat-completions-compatible stream state machine and records stream: true fixtures.
  • Mistral tool choice serialization was wrong/incomplete: Rig reused OpenAI required, while Mistral uses any; Rig also rejected specific tool choice even though Mistral accepts the OpenAI-style function object. This PR serializes Required as "any", supports Auto/None, and supports single specific tool choice as { "type": "function", "function": { "name": "..." } }.
  • Native structured output was ignored: output_schema only logged a warning. This PR maps Rig output schemas to Mistral response_format: { type: "json_schema", json_schema: ... } and adds a live-recorded JSON-schema cassette.
  • max_tokens was not sent: Mistral completion requests now include max_tokens when configured.
  • Usage conversion could misreport output tokens: converted Rig usage now uses Mistral completion_tokens directly instead of total_tokens - prompt_tokens.
  • Provider response preservation: converted responses now preserve message_id from the raw Mistral response id and tests assert raw id, model, finish_reason, and usage survive.
  • Valid unusual response shapes: assistant content: null and array content (e.g. text plus thinking parts) now deserialize without failing; text parts are preserved and unsupported thinking is ignored consistently with existing behavior.
  • Unstable Mistral tool-result names in replay requests: Rig does not store a stable tool name separately from tool-result ids, and generated ids made cassette request bodies nondeterministic. Mistral accepts tool results keyed by tool_call_id, so the provider now omits the optional name field for tool-result messages.

New cassette scenarios

  • sequential_complex_tool_calls_nonstreaming: five ordered tools covering empty args, nested JSON, arrays, escaped strings, optional/nullable fields, and usage/tool-history checks.
  • sequential_complex_tool_calls_streaming: real SSE streaming parity for the same long multi-turn tool loop.
  • parallel_tool_calls_single_turn_nonstreaming: two zero-arg tools in one assistant turn.
  • parallel_tool_calls_single_turn_streaming: real SSE streaming parity for parallel tool calls.
  • raw_stream_complex_tool_call_deltas_have_object_arguments: raw streaming tool-call output reassembles into JSON object arguments.
  • long_history_replay_with_tool_result_continuation: replays system/user/assistant/tool-call/tool-result history and forces no new tools.
  • tool_choice_auto_any_specific_and_none: validates auto, Mistral any, specific function-object choice, and none.
  • json_object_response_format_roundtrip: validates response_format: { type: "json_object" }.
  • json_schema_structured_output_roundtrip: validates native Mistral JSON-schema response format from Rig output_schema.

Models used

  • mistral-small-latest for all scenarios: current stable Mistral model with tool calling, streaming, JSON object mode, JSON-schema structured output, and low enough cost for cassette recording.

Inspiration references used

  • inspirations/vercel-ai-sdk/packages/mistral/src/mistral-prepare-tools.ts
  • inspirations/vercel-ai-sdk/packages/mistral/src/mistral-chat-language-model.ts
  • inspirations/vercel-ai-sdk/packages/mistral/src/mistral-chat-language-model.test.ts
  • inspirations/vercel-ai-sdk/packages/mistral/src/convert-to-mistral-chat-messages.test.ts
  • inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/models/mistral.py
  • inspirations/pydantic-ai/tests/models/test_mistral.py
  • inspirations/pydantic-ai/tests/providers/test_mistral.py
  • inspirations/langchain/libs/partners/mistralai/langchain_mistralai/chat_models.py
  • inspirations/langchain/libs/partners/mistralai/tests/unit_tests/test_chat_models.py
  • inspirations/langchain/libs/partners/mistralai/tests/integration_tests/test_chat_models.py
  • inspirations/semantic-kernel/python/semantic_kernel/connectors/ai/mistral_ai/services/mistral_ai_chat_completion.py
  • inspirations/semantic-kernel/python/tests/unit/connectors/ai/mistral_ai/services/test_mistralai_chat_completion.py

Recording / replay commands run

Record:

RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test mistral agent_tool_sessions -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test mistral long_history_replay_with_tool_result_continuation -- --nocapture --test-threads=1

Replay without credentials:

env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral agent_tool_sessions -- --nocapture --test-threads=1
env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral mistral -- --nocapture --test-threads=1
env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral cassette_safety -- --nocapture --test-threads=1

Other validation:

cargo fmt --check
cargo test -p rig-core providers::mistral -- --nocapture
cargo clippy --all-targets --all-features
cargo test

Manual cassette inspection notes

  • Reviewed all fixtures under tests/cassettes/mistral/agent_tool_sessions/.
  • Confirmed expected /v1/chat/completions request bodies for tools, tool_choice, parallel_tool_calls, stream: true, JSON object mode, JSON schema, and long history with assistant tool calls/tool results.
  • Confirmed responses preserve expected finish_reason, model, response ids, usage (prompt_tokens, completion_tokens, total_tokens, cache details), and streamed tool-call/text deltas.
  • Confirmed tool-result replay requests omit unstable generated name values while preserving tool_call_id.
  • Ran cassette safety and an additional grep for API-key/authorization/bearer/cookie markers; no secrets were present.

Known gaps / non-goals

  • No multimodal Mistral cassette coverage in this PR; Rig’s current Mistral chat conversion only supports text user content.
  • No Mistral reasoning cassette; Rig currently skips unsupported assistant-history reasoning for this provider.
  • No exact prose assertions; tests assert structural behavior, tool names/args/order, deterministic tool output markers, JSON structure, metadata, and usage.
  • Broad cargo test was run and reaches the existing OpenAI cassette failure in openai::cassette::permission_control::permission_control_streaming_example (request body mismatch against /v1/responses); Mistral coverage passes before that unrelated failure.

@gold-silver-copper gold-silver-copper added this pull request to the merge queue Jul 4, 2026
Merged via the queue into main with commit 95b37c6 Jul 4, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant