Add Mistral agent tool cassette regressions by gold-silver-copper · Pull Request #2010 · 0xPlaygrounds/rig

gold-silver-copper · 2026-07-04T06:25:57Z

Summary

Adds a new cassette-backed Mistral long-session regression suite under tests/providers/mistral/agent_tool_sessions.rs with 9 live-recorded fixtures.
Covers sequential and parallel multi-tool agent loops, real SSE streaming/non-streaming parity, streamed tool-call aggregation, long caller-owned chat history with assistant tool calls + tool results, tool choice modes, JSON object mode, JSON-schema structured output, and usage/metadata preservation.
Registers Mistral with the shared cassette harness and cassette safety checks.

Bugs found and fixed

Mistral streaming was buffered/non-streaming: CompletionModel::stream called the normal completion path and converted the final response into stream items. This PR implements real Mistral SSE chat-completions streaming using the shared OpenAI chat-completions-compatible stream state machine and records stream: true fixtures.
Mistral tool choice serialization was wrong/incomplete: Rig reused OpenAI required, while Mistral uses any; Rig also rejected specific tool choice even though Mistral accepts the OpenAI-style function object. This PR serializes Required as "any", supports Auto/None, and supports single specific tool choice as { "type": "function", "function": { "name": "..." } }.
Native structured output was ignored: output_schema only logged a warning. This PR maps Rig output schemas to Mistral response_format: { type: "json_schema", json_schema: ... } and adds a live-recorded JSON-schema cassette.
max_tokens was not sent: Mistral completion requests now include max_tokens when configured.
Usage conversion could misreport output tokens: converted Rig usage now uses Mistral completion_tokens directly instead of total_tokens - prompt_tokens.
Provider response preservation: converted responses now preserve message_id from the raw Mistral response id and tests assert raw id, model, finish_reason, and usage survive.
Valid unusual response shapes: assistant content: null and array content (e.g. text plus thinking parts) now deserialize without failing; text parts are preserved and unsupported thinking is ignored consistently with existing behavior.
Unstable Mistral tool-result names in replay requests: Rig does not store a stable tool name separately from tool-result ids, and generated ids made cassette request bodies nondeterministic. Mistral accepts tool results keyed by tool_call_id, so the provider now omits the optional name field for tool-result messages.

New cassette scenarios

sequential_complex_tool_calls_nonstreaming: five ordered tools covering empty args, nested JSON, arrays, escaped strings, optional/nullable fields, and usage/tool-history checks.
sequential_complex_tool_calls_streaming: real SSE streaming parity for the same long multi-turn tool loop.
parallel_tool_calls_single_turn_nonstreaming: two zero-arg tools in one assistant turn.
parallel_tool_calls_single_turn_streaming: real SSE streaming parity for parallel tool calls.
raw_stream_complex_tool_call_deltas_have_object_arguments: raw streaming tool-call output reassembles into JSON object arguments.
long_history_replay_with_tool_result_continuation: replays system/user/assistant/tool-call/tool-result history and forces no new tools.
tool_choice_auto_any_specific_and_none: validates auto, Mistral any, specific function-object choice, and none.
json_object_response_format_roundtrip: validates response_format: { type: "json_object" }.
json_schema_structured_output_roundtrip: validates native Mistral JSON-schema response format from Rig output_schema.

Models used

mistral-small-latest for all scenarios: current stable Mistral model with tool calling, streaming, JSON object mode, JSON-schema structured output, and low enough cost for cassette recording.

Inspiration references used

inspirations/vercel-ai-sdk/packages/mistral/src/mistral-prepare-tools.ts
inspirations/vercel-ai-sdk/packages/mistral/src/mistral-chat-language-model.ts
inspirations/vercel-ai-sdk/packages/mistral/src/mistral-chat-language-model.test.ts
inspirations/vercel-ai-sdk/packages/mistral/src/convert-to-mistral-chat-messages.test.ts
inspirations/pydantic-ai/pydantic_ai_slim/pydantic_ai/models/mistral.py
inspirations/pydantic-ai/tests/models/test_mistral.py
inspirations/pydantic-ai/tests/providers/test_mistral.py
inspirations/langchain/libs/partners/mistralai/langchain_mistralai/chat_models.py
inspirations/langchain/libs/partners/mistralai/tests/unit_tests/test_chat_models.py
inspirations/langchain/libs/partners/mistralai/tests/integration_tests/test_chat_models.py
inspirations/semantic-kernel/python/semantic_kernel/connectors/ai/mistral_ai/services/mistral_ai_chat_completion.py
inspirations/semantic-kernel/python/tests/unit/connectors/ai/mistral_ai/services/test_mistralai_chat_completion.py

Recording / replay commands run

Record:

RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test mistral agent_tool_sessions -- --nocapture --test-threads=1
RIG_PROVIDER_TEST_MODE=record cargo test -p rig --all-features --test mistral long_history_replay_with_tool_result_continuation -- --nocapture --test-threads=1

Replay without credentials:

env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral agent_tool_sessions -- --nocapture --test-threads=1
env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral mistral -- --nocapture --test-threads=1
env -u MISTRAL_API_KEY cargo test -p rig --all-features --test mistral cassette_safety -- --nocapture --test-threads=1

Other validation:

cargo fmt --check
cargo test -p rig-core providers::mistral -- --nocapture
cargo clippy --all-targets --all-features
cargo test

Manual cassette inspection notes

Reviewed all fixtures under tests/cassettes/mistral/agent_tool_sessions/.
Confirmed expected /v1/chat/completions request bodies for tools, tool_choice, parallel_tool_calls, stream: true, JSON object mode, JSON schema, and long history with assistant tool calls/tool results.
Confirmed responses preserve expected finish_reason, model, response ids, usage (prompt_tokens, completion_tokens, total_tokens, cache details), and streamed tool-call/text deltas.
Confirmed tool-result replay requests omit unstable generated name values while preserving tool_call_id.
Ran cassette safety and an additional grep for API-key/authorization/bearer/cookie markers; no secrets were present.

Known gaps / non-goals

No multimodal Mistral cassette coverage in this PR; Rig’s current Mistral chat conversion only supports text user content.
No Mistral reasoning cassette; Rig currently skips unsupported assistant-history reasoning for this provider.
No exact prose assertions; tests assert structural behavior, tool names/args/order, deterministic tool output markers, JSON structure, metadata, and usage.
Broad cargo test was run and reaches the existing OpenAI cassette failure in openai::cassette::permission_control::permission_control_streaming_example (request body mismatch against /v1/responses); Mistral coverage passes before that unrelated failure.

gold-silver-copper added 3 commits July 3, 2026 23:24

Add Mistral agent tool cassette regressions

217bcae

Normalize Mistral streaming cassette scrubbing

142e1c7

Rerecord Mistral streaming cassettes

8301c0f

gold-silver-copper added this pull request to the merge queue Jul 4, 2026

Merged via the queue into main with commit 95b37c6 Jul 4, 2026
6 checks passed

github-actions Bot mentioned this pull request Jul 4, 2026

chore: release v0.40.0 #1923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Mistral agent tool cassette regressions#2010

Add Mistral agent tool cassette regressions#2010
gold-silver-copper merged 3 commits into
mainfrom
test/mistral-cassette-regression

gold-silver-copper commented Jul 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gold-silver-copper commented Jul 4, 2026

Summary

Bugs found and fixed

New cassette scenarios

Models used

Inspiration references used

Recording / replay commands run

Manual cassette inspection notes

Known gaps / non-goals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant