feat(agent)!: hook system v2 — composable middleware by gold-silver-copper · Pull Request #2012 · 0xPlaygrounds/rig

gold-silver-copper · 2026-07-04T23:00:56Z

Hook system v2 — composable middleware

Evolves Rig's agent hook system into a composable middleware layer that supports serious production use cases — RAG/context injection, guardrails, request shaping, telemetry, tool policy, multi-turn orchestration — before the vector-store/RAG rework, and without touching vector stores. Major breaking change; no deprecated aliases (renamed once, correctly).

Update — round 3 (tool-execution semantics)

A third round redesigned tool-execution streaming for the cleanest, most correct semantics (breaking, per the mandate to prefer a small breaking change over surprising behavior):

Split model-emitted tool call from execution start. MultiTurnStreamItem::StreamAssistantItem(StreamedAssistantContent::ToolCall) now reports the tool call the model emitted — surfaced when the model turn is committed, whether or not Rig executes it. A new MultiTurnStreamItem::ToolExecutionStart { tool_call, internal_call_id } marks that Rig has started executing a tool, emitted only after the tool passed its ToolCall hook checks and its body actually runs (never for a dropped, hook-skipped, or invalid-recovery call). run_single_tool now returns ToolCallOutcome { content, executed } so the driver can tell a real run from a Flow::Skip. All three events (model call → execution start → result) correlate via internal_call_id. This is LangGraph's distinction between model-emitted tool calls and tool-execution lifecycle events.
Atomic per-batch commit/surface. drive_tool_calls collects tool outcomes instead of streaming them one-by-one. Successful ToolExecutionStart + ToolResult items are surfaced (in call order) and committed to history only after the whole batch settles successfully. On the first hook termination / fail-closed error the batch fails fast — stop new, drop not-yet-started concurrent siblings, drain in-flight, surface the deterministic lowest call-index error, surface no successful items, commit nothing (no orphan execution-start events, no partial history). Sequential and concurrent share the atomic path; run() and stream() return the same terminal reason. This matches OpenAI Agents' bounded concurrent execution that commits/surfaces only after the batch settles. (Previously the concurrent path streamed each result as its tool completed, in completion order.)
Local tool-choice validation. After per-turn patches and active_tools filtering, allowed_tool_names_for_choice validates the effective request before the provider call: ToolChoice::Required with no advertised tool (no executable tool and no synthetic output tool) and ToolChoice::Specific naming a tool not in the effective advertised set are local request errors with no provider round-trip. When a per-turn active_tools allow-list caused the incompatibility, the error says so and suggests setting a compatible tool_choice in the same RequestPatch. Structured-output Tool mode with no real tools still works when the synthetic output tool satisfies the choice. This is Pydantic AI's explicit local validation for impossible tool_choice/tool-set combinations.

Round-3 tests: split taxonomy + atomic ordering on both surfaces (stream_emits_model_tool_calls_then_atomic_execution_items, ..._results_in_call_order_after_batch_settles_...); concurrent_termination_surfaces_no_execution_items (no orphan start, no successful result, side effect ran but suppressed); Fix-3 unit tests + no-provider-call integration tests (required_with_empty_active_tools_..., specific_naming_filtered_out_tool_...); the anthropic streaming-tools cassette updated to assert call-order surfacing.

A 4-dimension adversarial review of round-3 confirmed 7 findings (no logic bugs in the atomic batch, split, or validation — the machinery was sound): ToolExecutionStart now carries the effective (hook-rewritten) tool call rather than the model's original — matching its doc and preventing a RewriteArgs redaction from leaking the original args (run_single_tool returns ToolExecution::Executed(effective_call) | Skipped); a Flow::Skip result is now surfaced as a ToolResult (no ToolExecutionStart) rather than committed-but-not-streamed; the active_tools error hint is shown only when the filter actually dropped a satisfying tool (not for a plain typo); and several doc-scoping/wording fixes (atomicity is not scoped to concurrency > 1; no completion-order claims). Regression tests added for each. Full rig-core lib 1138 pass; anthropic/openai/gemini cassette suites (322) pass.

Update — round 2 (correctness hardening)

A second adversarial review round, informed by how Pydantic AI / OpenAI Agents / LangGraph handle these cases, tightened four behaviors:

Flow::Terminate from a tool hook is now turn-wide fail-fast (was run-all-then-decide). Sequential execution (tool_concurrency == 1, the default) surfaces the terminate immediately and never starts the remaining sibling tools. Concurrent execution (> 1) drops not-yet-started siblings — a shared terminating flag makes them skip — while already-in-flight siblings are drained (so no detached task is left and the lowest call-index terminate reason still wins deterministically; a dropped sibling always has a higher index than the terminator that dropped it). No post-termination successful result is surfaced or committed. This avoids the Semantic-Kernel fail-open where every dispatched tool runs to completion before termination is honored, matching Pydantic AI / OpenAI Agents' cancel-or-drain-on-failure.
Tracing/redaction ordering guarantee. gen_ai.tool.call.result is recorded on the span only after the ToolResult hook runs — the redacted replacement on RewriteResult, the raw output on Keep, and nothing on Terminate — so a redaction guardrail never leaks the raw output to the trace or logs. The first ToolResult hook still observes the tool's actual output. (OpenAI Agents applies tool-output guardrails before tracing / tool-end / model-visible output for the same reason.)
Canonical ModelTurnFinished.content guarantee. On the streaming surface, ModelTurnFinished.content now carries the canonical committed content from StreamedTurn::finish (reasoning → text → tool-call ordering), matching what is recorded into run history — not the raw stream.choice aggregate. The raw choice is retained for the raw/final stream item. This mirrors Vercel AI SDK / LangGraph separating raw stream events from normalized final state; the blocking surface already used the committed resp.choice.
Scratchpad concurrency semantics documented. Removed the "never contends" claim. Scratchpad/HookContext now document that at tool_concurrency > 1 the ToolCall/ToolResult hooks for different tools share one HookContext and can run concurrently; Scratchpad::update is race-free per operation but imposes no deterministic ordering across concurrent tool hooks — store commutative/idempotent state or key per-tool state by the call id / internal call id (as LangGraph/OpenAI Agents/Pydantic AI treat run context as shared runtime state, not an ordered log under parallel tool execution).

Round-2 tests: sequential fail-fast (tool B side effect must not run) and concurrent drop-not-yet-started + drain-in-flight (both surfaces, Notify-synchronized for determinism, 5/5 non-flaky); redaction non-leak via a span-field capturing subscriber; canonical ModelTurnFinished ordering. The redaction and canonical tests were verified to fail without their fix. Also: git diff --check clean, () observes nothing, CHANGELOG signature includes ctx, streaming transcript-untouched assertion added.

Shortcomings addressed → what fixed them

Shortcoming (before)	Fix
Hooks short-circuit on the first non-`Continue`, so a RAG hook's request patch skips a later tool-policy/telemetry hook	Accumulating dispatch: on `CompletionCall`, every hook runs and their patches merge
`RequestOverride` is narrow, replacement-only, no per-hook merge semantics	`RequestPatch` with documented per-field merge rules
Hooks can't inject context/RAG documents	`RequestPatch::extra_context: Vec<Document>`
No run-scoped identity/state; multi-turn orchestration needs ad-hoc interior mutability	`HookContext` (run id, turn, streaming flag, agent name, shared `Scratchpad`)
Rewrites single-shot (only the first `RewriteArgs`/`RewriteResult` wins)	Chained rewrites across the stack
Streaming tool-only / reasoning-only turns fire no lifecycle event	`StepEvent::ModelTurnFinished` (once per accepted turn, both surfaces)

New dispatch contract

A HookStack combines hooks in registration order, and how their Flow results combine depends on the event:

CompletionCall — accumulate & merge. Every hook runs; each Flow::PatchRequest is merged in registration order into one effective patch. A mergeable patch does not short-circuit later hooks. Flow::Terminate stops the stack; any unsupported flow fails closed (accumulated patch discarded).
ToolCall / ToolResult — chain. Every hook runs; a RewriteArgs/RewriteResult is threaded into the next hook's event so rewrites compose (redact → truncate). Skip/Terminate are terminal mid-chain. The first result hook still observes the tool's actual output.
Every other event — first non-Continue wins (observe-only / recovery: CompletionResponse, ModelTurnFinished, InvalidToolCall, streamed deltas).

Nesting composes: a HookStack pushed as a hook returns its own net flow (a merged patch / threaded rewrite / terminal action), which the outer stack folds in again. Register observe-only hooks before steering hooks (a Terminate short-circuits the stack).

`RequestPatch` merge semantics (per field)

Field	hook ⊕ hook (registration order)	patch → baseline
`extra_context`	append (earlier hooks' docs first)	append after static + dynamic context
`additional_params`	shallow-merge top-level keys, later hook wins	shallow-merge onto baseline params
`preamble`	last writer wins (warns on conflict)	replaces
`temperature`, `max_tokens`, `tool_choice`	last writer wins (warns on conflict)	replaces
`active_tools`	set intersection (warns when empty)	narrows the advertised set
`history`	last writer wins (warns on conflict)	replaces the messages sent this turn

Patches are per-turn and non-sticky — CompletionCall re-fires each turn and re-resolves from the agent baseline.

`extra_context` — document ordering

Document order in the completion request is static → dynamic (vector-store) → hook extras, hook extras in registration order. The RAG query text is unchanged. Per-turn and non-sticky; works identically on run() and stream().

`history` view

RequestPatch::history replaces the prior messages sent to the provider for the turn (context-window compaction / summarization). The persisted transcript and run state are untouched, and RAG's query text still derives from the original history — only what is sent changes.

`HookContext`

Passed by & to every on_event: run_id(), turn() (advanced per turn by the driver), is_streaming(), agent_name(), and a shared Scratchpad (interior-mutable type-map: insert/get/update/remove/contains) so cooperating hooks share run-scoped state without their own Arc<Mutex<…>>. It is a driver construct — nothing from it reaches the sans-IO AgentRun.

Streaming/non-streaming

run() and stream() share one drive loop (drive_agent), so every new semantic (patch accumulation, extra_context, history, chained rewrites) lands once and behaves identically — verified on both surfaces in tests. ModelTurnFinished fires exactly once per accepted turn on both surfaces, including a streamed tool-only turn (which fires no StreamResponseFinish); it is suppressed for turns recovered via invalid-tool-call retry/repair/skip, and fires after the medium-specific raw event when one fires. CompletionResponse (blocking) and StreamResponseFinish (streaming, text turns only) are retained for the raw provider payload.

Decision-point resolutions

Blind merge on CompletionCall (hooks see the agent baseline, not earlier hooks' patches): simpler, keeps StepEvent: Copy, sufficient because patches are declarative with documented conflict rules.
active_tools intersects rather than last-writer-wins: it is an allow-list guardrail, so two narrowing hooks must compose as narrowing; widening is a config error.
Scalars / preamble / history: last-writer-wins with a tracing::warn! — silent conflict is the Semantic Kernel wart; a warning keeps composition debuggable. Additive guidance belongs in extra_context, not preamble concatenation.
ModelTurnFinished fires after the raw event: consistent "raw before normalized" ordering across surfaces.
HookContext.turn is an AtomicUsize (built once per run, advanced per turn): gives a coherent turn to non-CompletionCall events; Sync for &HookContext across awaits.

Inspiration references

Studied under /Users/kisaczka/Desktop/code/many_rigs/inspirations/:

Semantic Kernel (docs/decisions/0043-filters-exception-handling.md, KernelFunctionFromPrompt.InvokeStreamingCoreAsync): its ADR admits forgetting await next() silently disables downstream filters, and it shipped a real streaming/non-streaming short-circuit asymmetry. → kept Rig's typed Flow + fail-closed model; enforced both-surface parity with tests.
LangGraph (chat_agent_executor.py llm_input_messages vs messages; pregel/main.py invoke = stream): the ephemeral per-call view vs committed edit → RequestPatch::history; one drive loop → parity guarantee.
Vercel AI SDK (generate-text.ts prepareStep, util/notify.ts): per-field replace-vs-merge documented explicitly; observers return void and are error-isolated; naming churn (onFinish→onEnd) → renamed once with no aliases.
OpenAI Agents (tool_guardrails.py): rich enum outcomes over booleans; run-scoped RunContextWrapper → HookContext.
LangChain (middleware/types.py, factory.py): first-registered-is-outermost onion, immutable request + override; classic callbacks' untyped-kwargs pain → typed StepEvent.
pydantic-ai (capabilities/abstract.py): a run-scoped RunContext on every hook is table stakes; but its ~30-method mega-trait is a tax Rig avoids by keeping the single on_event + match-arm model.

Tests

cargo test -p rig-core --lib agent — 226 passed (220 existing + 6 new hook-v2 integration tests), 0 failed.
New hook.rs unit tests: 16 (accumulation, terminate short-circuit, nested-stack composition, per-field merge rules incl. empty intersection, scratchpad).
New runner.rs integration tests (both surfaces): extra_context after static context, append order, per-turn non-sticky, history override (transcript untouched), ModelTurnFinished once per accepted turn incl. tool-only, chained RewriteArgs+RewriteResult.
Provider cassette hook tests (replay): anthropic request_override/tool_call_rewrite_args/tool_result_rewrite (6), openai request_hook/permission_control (3), gemini agent_run_streamed (6) — all pass on both surfaces.
cargo fmt --check, cargo clippy -p rig-core --lib --tests --all-features (0 warnings), cargo doc -p rig-core --no-deps --all-features (0 warnings). All 5 hook examples + agent_with_durable_approval cargo check clean.

A 5-dimension adversarial review (API/design, merge-semantics edge cases, streaming parity, regressions, deep dispatch correctness) surfaced no correctness/parity/regression issues; the only confirmed findings were doc-only — the four public add_hook rustdocs and the foundational CHANGELOG bullet still carried the pre-v2 "first non-Continue short-circuits the rest" blanket claim (false for the new accumulate/chain cases) — now corrected to state the event-dependent composition and point to the module docs.

Known limitations & follow-ups (not in this PR)

RunStarted/RunFinished { outcome } observe-only lifecycle events (so telemetry sees hook-initiated terminate) — deferred.
An observer registration mode (error-isolated, never-skipped) if ordering guidance proves insufficient — deferred; the accumulation change already prevents ordinary patches from skipping later hooks.
Transcript-visible tool-call repair vs execution-only arg rewrite — documented need.
Tool-definition rewriting / per-turn tool injection via the patch — RequestPatch is #[non_exhaustive], so a future tools field is non-breaking.
The vector-store/RAG migration itself: dynamic_context reimplemented as a bundled hook using extra_context.

Non-goals (unchanged)

No vector-store crate/VectorStoreIndex removal; dynamic_context untouched; no RAG example migration; no generic Retriever trait; no second observer trait or declarative ordering engine.

Evolve the agent hook system into a composable middleware layer ahead of the vector-store/RAG rework, without touching vector stores. - HookContext: run-scoped context (run_id, turn, is_streaming, agent_name, shared Scratchpad) passed to every on_event; breaks the trait signature. - Mergeable request patches: RequestOverride -> RequestPatch, Flow::OverrideRequest -> Flow::PatchRequest. CompletionCall patches from every hook accumulate and merge in registration order (no more first-patch short-circuit); documented per-field merge rules. - RequestPatch::extra_context: per-turn passive-RAG document injection, appended after static + dynamic context. - RequestPatch::history: per-turn replacement of the messages sent to the provider; transcript untouched, RAG keys off the original history. - Chained tool rewrites: RewriteArgs/RewriteResult compose across a HookStack. - StepEvent::ModelTurnFinished: normalized once-per-turn event on both surfaces (covers streamed tool-only turns). All merge/patch/context/history/event semantics land once in the shared drive loop, so run() and stream() behave identically. Fail-closed handling, invalid-tool-call recovery, and run-all-then-decide tool execution unchanged.

…ook v2 Review found the pre-v2 'first non-Continue short-circuits the rest' blanket claim still on all four public add_hook rustdocs and the foundational CHANGELOG bullet — false for CompletionCall (accumulate) and ToolCall/ToolResult (chain). State the event-dependent composition and point to the hook module docs.

… ModelTurnFinished Address the second adversarial review round: - Flow::Terminate from a tool hook is now turn-wide fail-fast. Sequential execution stops before starting remaining siblings; concurrent execution drops not-yet-started siblings (a shared flag makes them skip) while draining already-in-flight ones, so no new side effect runs after a termination and the lowest call-index terminate reason still wins deterministically. Avoids the Semantic-Kernel run-all-then-decide fail-open. - Tool-result redaction: gen_ai.tool.call.result is recorded only AFTER the ToolResult hook runs (replacement on RewriteResult, raw on Keep, nothing on Terminate), so a redaction hook never leaks the raw output to the trace/logs. - Streaming ModelTurnFinished.content now carries the canonical committed StreamedTurn::choice (reasoning->text->tool ordering), not the raw stream.choice aggregate; raw is kept for final stream behavior. - Scratchpad/HookContext docs: drop "never contends"; document that at tool_concurrency > 1 tool hooks for different tools share the context and run concurrently, update is race-free per op but imposes no ordering; recommend commutative/idempotent state or keying by call id. - Docs/test hygiene: CHANGELOG old signature includes ctx; anthropic request_override wording; () observes nothing; streaming transcript-untouched assertion. Tests: sequential + concurrent fail-fast (both surfaces), redaction non-leak (span capture), canonical ModelTurnFinished ordering — all verified to fail without their fix. Full agent suite 229 passed.

…; fix () observes doc Round-2 review follow-ups (both low-severity accuracy fixes): - The concurrent drop test's narrative was wrong: a synchronous terminator meant no sibling ran at all, so it never exercised the in-flight-drain vs beyond-window-drop distinction it described. Rewrite with a Notify so tc1 is genuinely in flight (drains, called contains 1) while tc2 beyond the window is dropped (called never contains 2) and tc0's body never runs. Renamed accordingly; 8/8 non-flaky. - Correct the () hook observes() doc: observes gates only the two streaming delta events, not 'every event' (non-delta events dispatch unconditionally).

…al tool-choice validation Round-3 redesign for correctness and cleaner semantics (breaking): 1. Split model-emitted tool call from execution start. StreamAssistantItem (StreamedAssistantContent::ToolCall) now reports the tool call the MODEL emitted (at turn commit, whether or not it runs). A new MultiTurnStreamItem::ToolExecutionStart marks that Rig actually started executing a tool (after hook checks) — never for a dropped/hook-skipped/ invalid-recovery call. run_single_tool now reports `executed` to distinguish a real run from a Flow::Skip. 2. Atomic per-batch commit/surface. drive_tool_calls collects tool outcomes instead of streaming them one-by-one; execution-start + result items are surfaced (in call order) and committed only after the whole batch settles OK. On the first termination/fail-closed error the batch fails fast (stop new, drop not-yet-started concurrent siblings, drain in-flight, lowest call-index error) with no successful items surfaced and no history commit — no orphan execution-start events. Sequential and concurrent share the atomic path; results now surface in call order (was completion order for concurrent). 3. Local tool-choice validation. allowed_tool_names_for_choice now validates the effective request before the provider call: Required with no advertised tool (executable or output tool) and Specific naming a non-advertised tool are local errors, with an active_tools-aware hint suggesting a compatible tool_choice in the same RequestPatch. Structured-output Tool mode with no real tools still works via the synthetic output tool. Tests: split taxonomy + atomic ordering (both surfaces), no-orphan-execution-start on concurrent termination, no-successful-result/no-commit on termination, Fix-3 unit + no-provider-call integration tests. Full lib 1135 pass; anthropic/openai/ gemini cassette suites (322) pass.

…esults, precise active_tools hint Address the round-3 adversarial review (7 confirmed findings): - ToolExecutionStart now carries the EFFECTIVE (hook-rewritten) tool call, not the model's original — matching its doc and preventing a RewriteArgs redaction from leaking the original args. run_single_tool returns ToolExecution::Executed(effective_call) | Skipped. - A Flow::Skip tool result is now surfaced as a ToolResult (no ToolExecutionStart) instead of being committed-but-not-streamed — the stream matches history again. - The active_tools error hint is shown only when the filter actually dropped a tool that would have satisfied the choice, not for a plain typo (threads the pre-filter tool set into allowed_tool_names_for_choice). - Docs: ToolExecutionStart/StreamUserItem and the tool_concurrency method docs no longer scope atomicity to concurrency>1 or claim completion-order surfacing; ToolExecutionStart timing reworded to the atomic-settle model. Tests: ToolExecutionStart-carries-rewritten-args, hook-skip-surfaces-result-without- execution-start, specific-typo-not-blamed-on-active_tools. Full lib 1138 pass; anthropic/openai/gemini cassette suites (322) pass.

…::large_enum_variant) The Executed(ToolCall) variants dwarfed the empty Skipped/Preresolved variants; box the ToolCall so the enums stay small. Surfaced under the rig package's feature set (cargo clippy -p rig --all-features).

- Correct AgentHook::on_event default doc: the default observes() returns true (not "observes nothing"); () skips only the high-frequency delta dispatch and still receives every other event, returning Continue. - Add unit_hook_observes_no_event_kind guarding the () observes override. - Rename first_terminate_short_circuits_on_observe_only_events (it dispatched a chained ToolCall) to ..._on_chained_tool_call; add a real observe-only (_ => arm) terminate test via TextDelta. - Assert ModelTurnFinished is suppressed on recovered turns on both the blocking and streaming surfaces (its own guard, distinct from the response-finish guards), with cross-driver parity. - Add runner_add_hook_appends_to_agent_default_hooks proving runner/prompt add_hook appends to (not replaces) the agent's default hooks. - Fix broken intra-doc link ToolCallOutcome::executed -> ::execution.

…ut-tool warning Deep-review follow-ups on hook system v2: - Fix stale AgentRunner::tool_concurrency streaming doc. It still described the pre-atomic-batch behavior ("emits each ToolResult as its tool finishes ... completion order"); the driver now surfaces ToolExecutionStart + ToolResult stream items in call order only after the whole batch settles. This matches the implementation and the sibling StreamingPromptRequest::tool_concurrency doc that links to it. - Fix Scratchpad doc that referenced "clones of a run's HookContext" — HookContext is not Clone (holds an AtomicUsize; it is shared by &-reference). Attribute the sharing to the shared &HookContext / Scratchpad clones instead. - completion.rs: the committed Tool-mode stall warning fired falsely when tool_choice = Specific names the synthetic output tool. allowed_tool_names_for_choice accepts and advertises such a choice (the output tool is callable), but tool_choice_permits_output_tool treats every Specific as forbidding. Add a name-aware output_tool_callable helper used only at the warning site (resolve_output_mode keeps the coarser check, since the output-tool name is not known there yet) and a unit test.

…ool-mode finalization, fix stale changelog Deep-review follow-ups on structured-output Tool-mode interactions: - ModelTurnFinished.content: doc said "as recorded into the run", but on a Tool-mode output-tool finalization turn the run persists the turn as assistant text (structured output) with the tool call dropped, while both drivers fire the event with the model-emitted content (including the output-tool call). The content is the model's committed turn output (consistent across surfaces, fired at turn commit before finalization). Correct the field doc to state this with an explicit Tool-mode caveat, and add a both-surfaces regression test. - StreamAssistantItem: the contract promised a complete ToolCall item for every model tool call whether or not Rig executes it, but an output-tool call finalizes the run directly (bypassing drive_tool_calls), so its complete item is never emitted (only deltas + FinalResponse); invalid-recovery calls are the same shape. Narrow the doc to document both carve-outs, and add a streaming regression test. - CHANGELOG: two Unreleased lines still described streaming ToolResult items in completion order, contradicting the atomic call-order-after-settle behavior. Update them to match.

A comparative study of LangChain, LangGraph, OpenAI Agents, pydantic-ai, Semantic Kernel, and the Vercel AI SDK surfaced three scoped, pure-doc improvements to the hook system (no API or behavior change): - Flow::Skip / Flow::skip: document that `reason` is delivered to the model verbatim as the tool result, so it doubles as a prompt — tell the model the tool did not run and not to retry, or it may re-emit the identical call (mirrors LangChain's HITL reject/respond feedback). - Flow::retry: add a worked example building corrective feedback from InvalidToolCallContext::available_tools (closes an asymmetric doc gap — the rewrite_args/rewrite_result constructors already had examples; mirrors LangGraph's INVALID_TOOL_NAME_ERROR_TEMPLATE). - hook module docs: add a "Why a returned Flow, not a next()-style middleware" rationale, citing the silent-skip footgun that a next()/middleware model carries (Semantic Kernel ADR 0043) and which Rig's fail-closed returned-Flow design prevents.

) * test(gemini): live cassette hook-system stress suite Adds a Gemini cassette-backed stress suite (tests/providers/gemini/cassette/hook_stress.rs) that exercises the merged hook system (v2, 0xPlaygrounds#2012) across long, realistic multi-turn workflows recorded against real Gemini and replayed deterministically: - HookContext identity (stable run_id, advancing turn, is_streaming, agent_name) and a shared Scratchpad threaded across two cooperating hooks and many turns. - RequestPatch: extra_context injection + active_tools narrowing + temperature, proven by downstream effects (the injected fact reaches the answer; the filtered-out tool never executes). - Chained tool lifecycle: RewriteArgs -> observe -> RewriteResult redaction, with paired positive/negative assertions (the marker reaches the model; the raw result does not; the transcript keeps the model's original args). - Streaming lifecycle ordering (tool call -> execution start -> tool result -> final response) and is_streaming parity vs the blocking surface. - Per-turn atomic call/result pairing and the Skip zero-execution invariant. Assertions follow tools_support's loose-assertion convention (exact equality only for rig-synthesized values), so the cassettes survive re-recording; hooks are deterministic so replay requests stay byte-identical. Cassettes are auto-scrubbed and safety-checked by the recorder (key -> [REDACTED]; ids/signatures placeholdered) and were reviewed manually. No hook-system bugs surfaced: every scenario passed on the first live recording, confirming the documented behavior. Inspired by how LangChain, LangGraph, OpenAI Agents, pydantic-ai, Semantic Kernel, and the Vercel AI SDK test their hook/middleware/guardrail systems (ordered breadcrumb logs; proving mutations via downstream effects; paired positive+negative redaction; zero-downstream skip invariants). * test(gemini): expand hook-system stress suite (+24 live cassette tests) Broadens the Gemini hook-system stress suite from 6 to 30 recorded workflows across four themed cassette files, driven by a shared deterministic-fixtures module (hook_stress_support.rs): - hook_stress_context (8): HookContext identity (stable run_id, advancing turn, is_streaming, agent_name incl. the unset case); a shared Scratchpad written by one hook and read by a second, growing across turns; internal_call_id correlation of ToolCall/ToolResult; two observe-only hooks both firing; add_hook appending across builder + request; CompletionCall patch accumulation; and active_tools set-intersection across two narrowing hooks. - hook_stress_patch (4): preamble override; tool_choice=Required (first turn only); per-turn history replacement injecting a prior fact; multi-field patch (preamble + extra_context). - hook_stress_tools (6): single-key and chained RewriteArgs; chained RewriteResult (redact -> wrap) and truncation; Terminate from a ToolResult (post-execution); and model-driven recovery from a tool error. - hook_stress_streaming (6): TextDelta / StreamResponseFinish / ModelTurnFinished on streaming; ModelTurnFinished on tool turns; RewriteResult redaction reaching the FinalResponse; active_tools narrowing and Skip on the streaming driver; and blocking-vs-streaming answer parity (two cassettes). All recorded live against Gemini and replaying deterministically; every patch / rewrite / skip effect is proven by a downstream-observable change, and model-shaped values use loose assertions (exact only for rig-synthesized ones). Cassettes are auto-scrubbed + safety-checked by the recorder and were reviewed. A footgun surfaced along the way: forcing tool_choice=Required on *every* turn loops until max_turns (each turn re-forces a tool call). Captured with a first-turn-only patch fixture (FirstTurnPatch) so the intended pattern is shown. No hook-system bugs found; every scenario confirms the documented behavior.

gold-silver-copper added 11 commits July 4, 2026 15:48

gold-silver-copper added this pull request to the merge queue Jul 5, 2026

Merged via the queue into main with commit e43089a Jul 5, 2026
6 checks passed

gold-silver-copper mentioned this pull request Jul 5, 2026

test(gemini): live cassette hook-system stress suite #2013

Merged

github-actions Bot mentioned this pull request Jul 4, 2026

chore: release v0.40.0 #1923

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(agent)!: hook system v2 — composable middleware#2012

feat(agent)!: hook system v2 — composable middleware#2012
gold-silver-copper merged 11 commits into
mainfrom
feat/hook-system-v2

gold-silver-copper commented Jul 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

gold-silver-copper commented Jul 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Hook system v2 — composable middleware

Update — round 3 (tool-execution semantics)

Update — round 2 (correctness hardening)

Shortcomings addressed → what fixed them

New dispatch contract

RequestPatch merge semantics (per field)

extra_context — document ordering

history view

HookContext

Streaming/non-streaming

Decision-point resolutions

Inspiration references

Tests

Known limitations & follow-ups (not in this PR)

Non-goals (unchanged)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gold-silver-copper commented Jul 4, 2026 •

edited

Loading

`RequestPatch` merge semantics (per field)

`extra_context` — document ordering

`history` view

`HookContext`