Skip to content

feat: olla mocking skills for claude#174

Merged
thushan merged 7 commits into
mainfrom
feature/olla-mock
Jun 11, 2026
Merged

feat: olla mocking skills for claude#174
thushan merged 7 commits into
mainfrom
feature/olla-mock

Conversation

@thushan

@thushan thushan commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Our nightly / regression scripts used in tensorfoundry.io products for testing ported to Olla - and tweaked.

  • /olla-validate --quick - 5–10 min gate after major changes
  • /olla-validate --nightly - multi-hour pre-release gate (chaos, soak, Sherpa pass, forced-translation pass, benchmarks)
  • No flag - prompts for the depth

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive Validation Harness guide and area runbooks (routing, resilience, observability, OpenAI/Anthropic) with quick/nightly workflows and reporting.
  • New Features

    • Introduced a local multi-protocol mock backend for end-to-end validation with configurable fault-injection and control endpoints.
    • Added a command to run the mock backend for local validation.
  • Tests

    • Added extensive test coverage for streaming, non-streaming, error modes and aggregate stats.
    • Included validation configs for general and tight-limits scenarios.
  • Chores

    • Wired validation harness results reporting and test run bookkeeping.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR adds an agent-driven end-to-end validation harness: SKILL orchestration, per-area validation checklists, a multi-protocol mock LLM backend (ollamock) with runtime fault injection and streaming support, test suite and harness configs, and documentation + mkdocs navigation.

Changes

End-to-End Validation Harness

Layer / File(s) Summary
Validation Skill Orchestration
.claude/skills/olla-validate/SKILL.md
Eight-phase orchestration: static gate, build, fleet boot, Wave 1 parallel read-only agents, Wave 2 resilience (fault injection allowed), nightly sequential extensions (limits, translation-forced Anthropic, Sherpa quick-check, soak+chaos, optional bench), mandatory teardown, and report aggregation with verdict gating. Defines agent roles, JSON report contract, and escalation rules.
Validation Area Checklists
.claude/skills/olla-validate/areas/*.md
Six area runbooks (Anthropic, OpenAI API, core-routing, limits-failures, observability, resilience) describe read-only quick checks and expanded nightly scenarios covering streaming, routing, limits, failover, stats, and translation-forced validation.
Mock Backend Behaviour & Control Plane
test/cmd/ollamock/behaviour.go
Behaviour config and Mode enum (ok,error,flaky,hang,slow), deterministic seeded RNG, partial-PATCH merge semantics, request stats store, and HTTP control endpoints for behaviour, reset, and stats.
Mock Backend Handlers & Routes
test/cmd/ollamock/handlers.go
HTTP mux wiring, request wrapper that records per-path stats and sets X-Ollamock-Instance, applyBehaviour short-circuiting (health gating, hang, deterministic/flaky errors, latency, malformed JSON), and model-listing handlers for Ollama/LM Studio/OpenAI/Lemonade plus health/root endpoints.
Streaming Implementation
test/cmd/ollamock/streaming.go
Protocol-specific streaming: Ollama NDJSON (done:true record), OpenAI SSE with data: [DONE], Anthropic event sequence; TTFT/TPS pacing, mid-stream truncation (DropMidStream), abrupt-close attempts, and SSE/NDJSON helpers.
Main CLI and Process Management
test/cmd/ollamock/main.go
Server entrypoint with CLI flags (addr, name, models, ttft-ms, tps, stream-chunks), parseModels helper, background ListenAndServe, signal handling, graceful shutdown.
Tests and README
test/cmd/ollamock/ollamock_test.go, test/cmd/ollamock/README.md
httptest suite covering model listings, non-stream and streaming flows (SSE/NDJSON parsing and termination), behaviour injection tests, stats tests, and documentation for running, flags, endpoints, behaviour modes, control-plane API, and curl examples.
Harness Configuration
test/validate/config.validate.yaml, test/validate/config.validate.limits.yaml
Main harness config with seven static mock endpoints, discovery/model-registry/unifier settings, Anthropic translator enabled (passthrough), and a tight-limits config for request/size/rate boundary testing.
Documentation & Navigation
CLAUDE.md, docs/content/development/validation.md, docs/content/development/testing.md, docs/mkdocs.yml, AGENTS.md
CLAUDE.md updated with validation harness docs; new validation.md explains harness topology, modes, and ollamock; testing.md tree updated; mkdocs.yml adds nav entry; AGENTS.md now references CLAUDE.md.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

enhancement, documentation, experimental

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: olla mocking skills for claude' directly addresses the main change: adding Claude skills for Olla mocking/validation harness with multiple run modes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/olla-mock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
test/cmd/ollamock/streaming.go (2)

580-601: 💤 Low value

Consider using context.Context directly rather than interface.

The functions applyTTFT and applyTPS accept interface{ Done() <-chan struct{} } to access the Done() channel. Since context.Context is the standard interface for cancellation and all callers pass r.Context(), using context.Context directly would be clearer and more idiomatic.

♻️ Proposed change
-func applyTTFT(ctx interface{ Done() <-chan struct{} }, ttftMS int) {
+func applyTTFT(ctx context.Context, ttftMS int) {
 	if ttftMS <= 0 {
 		return
 	}
 	select {
 	case <-ctx.Done():
 	case <-time.After(time.Duration(ttftMS) * time.Millisecond):
 	}
 }

-func applyTPS(ctx interface{ Done() <-chan struct{} }, tps int) {
+func applyTPS(ctx context.Context, tps int) {
 	if tps <= 0 {
 		return
 	}
 	delay := time.Duration(1000/tps) * time.Millisecond
 	select {
 	case <-ctx.Done():
 	case <-time.After(delay):
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cmd/ollamock/streaming.go` around lines 580 - 601, Change the parameter
type for applyTTFT and applyTPS from the generic interface{ Done() <-chan
struct{} } to the concrete context.Context: update func signatures applyTTFT(ctx
context.Context, ttftMS int) and applyTPS(ctx context.Context, tps int), import
the context package, and ensure all callers passing r.Context() keep working (no
other behavior changes needed since context.Context exposes Done()); this makes
the code idiomatic and clearer about cancellation semantics.

18-32: ⚡ Quick win

Remove dead code and misleading comment.

Lines 20–26 contain unused code (two decoder declarations that are never used). The comment on lines 27–28 mentions "re-parse permissively" but there's only a single parse operation. This appears to be leftover experimental code.

♻️ Proposed cleanup
 func parseInferenceRequest(r *http.Request) (inferenceRequest, error) {
 	var req inferenceRequest
-	dec := json.NewDecoder(r.Body)
-	dec.DisallowUnknownFields()
-
-	// We only need the two top-level fields; unknown fields come from real
-	// clients that send messages, tools, temperature etc. Allow them silently.
-	dec2 := json.NewDecoder(r.Body)
-	_ = dec2
-	// Re-parse permissively — DisallowUnknownFields was too strict for real
-	// client payloads. Use a map-based approach instead.
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil && err != io.EOF {
 		return inferenceRequest{}, err
 	}
 	return req, nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cmd/ollamock/streaming.go` around lines 18 - 32, In
parseInferenceRequest: remove the dead/unused variables dec and dec2 and the
misleading comment about "re-parse permissively"; instead keep a single
permissive decode using json.NewDecoder(r.Body).Decode(&req) (with the same err
!= io.EOF handling) and drop the DisallowUnknownFields call so the function only
performs one decode pass into inferenceRequest.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/olla-validate/areas/observability.md:
- Around line 13-15: Update the endpoint name list used by the GET
/internal/status/endpoints check to match the SKILL.md topology: replace
mock-vllm-a with mock-vllm-e, mock-litellm-b with mock-litellm-f, and
mock-llamacpp-d with mock-llamacpp-g so the expected seven endpoints
(mock-openai-a/b, mock-vllm-e, mock-litellm-f, mock-ollama-c, mock-lmstudio-d,
mock-llamacpp-g) match the actual /internal/status/endpoints output.

In @.claude/skills/olla-validate/areas/resilience.md:
- Around line 89-91: Update the "Final state assertion" text so the mock count
matches the topology: replace the phrase "all four mocks" with "all seven mocks"
(the assertion that all mocks report default behaviour via `GET
/_mock/behaviour` and that all 7 endpoints are healthy and `/internal/status`
returns 200 must check seven mocks). Ensure any nearby references in the same
paragraph to mock count are updated to "seven" so the final assertion
consistently verifies all seven mocks.

In @.claude/skills/olla-validate/SKILL.md:
- Line 178: Documentation and test instructions incorrectly reference "four
mocks" for the reset and verification steps; update all occurrences of the reset
instruction and final assertion that say "four mocks" (e.g., the POST
/_mock/reset step and the final confirmation sentence) to reflect seven mocks
(ports 19431–19437) so both the orchestration and area validation steps reset
and verify all seven mocks instead of four. Ensure the wording for POST
/_mock/reset and the final assertions explicitly state "seven mocks (ports
19431–19437)".

---

Nitpick comments:
In `@test/cmd/ollamock/streaming.go`:
- Around line 580-601: Change the parameter type for applyTTFT and applyTPS from
the generic interface{ Done() <-chan struct{} } to the concrete context.Context:
update func signatures applyTTFT(ctx context.Context, ttftMS int) and
applyTPS(ctx context.Context, tps int), import the context package, and ensure
all callers passing r.Context() keep working (no other behavior changes needed
since context.Context exposes Done()); this makes the code idiomatic and clearer
about cancellation semantics.
- Around line 18-32: In parseInferenceRequest: remove the dead/unused variables
dec and dec2 and the misleading comment about "re-parse permissively"; instead
keep a single permissive decode using json.NewDecoder(r.Body).Decode(&req) (with
the same err != io.EOF handling) and drop the DisallowUnknownFields call so the
function only performs one decode pass into inferenceRequest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa233c9a-e858-43ca-97c8-30aa4102245a

📥 Commits

Reviewing files that changed from the base of the PR and between 455e501 and a94ce1c.

📒 Files selected for processing (19)
  • .claude/skills/olla-validate/SKILL.md
  • .claude/skills/olla-validate/areas/anthropic.md
  • .claude/skills/olla-validate/areas/core-routing.md
  • .claude/skills/olla-validate/areas/limits-failures.md
  • .claude/skills/olla-validate/areas/observability.md
  • .claude/skills/olla-validate/areas/openai-api.md
  • .claude/skills/olla-validate/areas/resilience.md
  • CLAUDE.md
  • docs/content/development/testing.md
  • docs/content/development/validation.md
  • docs/mkdocs.yml
  • test/cmd/ollamock/README.md
  • test/cmd/ollamock/behaviour.go
  • test/cmd/ollamock/handlers.go
  • test/cmd/ollamock/main.go
  • test/cmd/ollamock/ollamock_test.go
  • test/cmd/ollamock/streaming.go
  • test/validate/config.validate.limits.yaml
  • test/validate/config.validate.yaml

Comment thread .claude/skills/olla-validate/areas/observability.md Outdated
Comment thread .claude/skills/olla-validate/areas/resilience.md Outdated
Comment thread .claude/skills/olla-validate/SKILL.md Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.claude/skills/olla-validate/SKILL.md (1)

188-190: 💤 Low value

Awkward phrasing: "return to healthy".

"return to healthy" is grammatically awkward. Consider "return to health" or "become healthy again".

✏️ Suggested rewrite
-After wave 2: reset all mock behaviours, re-confirm all endpoints return to
-healthy within 60s (this is itself a recovery assertion - record it; health
+After wave 2: reset all mock behaviours, re-confirm all endpoints return to
+health within 60s (this is itself a recovery assertion - record it; health
 probes tick globally every 30s regardless of per-endpoint check_interval).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/olla-validate/SKILL.md around lines 188 - 190, The phrase
"return to healthy" in the sentence starting with "After wave 2: reset all mock
behaviours, re-confirm all endpoints return to healthy within 60s..." is
awkward; change it to a clearer phrasing such as "return to health", "become
healthy again", or "are healthy again" and update that sentence in SKILL.md (the
line beginning "After wave 2: reset all mock behaviours...") so it reads
smoothly while preserving the timing/health-probe details.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.claude/skills/olla-validate/SKILL.md:
- Around line 188-190: The phrase "return to healthy" in the sentence starting
with "After wave 2: reset all mock behaviours, re-confirm all endpoints return
to healthy within 60s..." is awkward; change it to a clearer phrasing such as
"return to health", "become healthy again", or "are healthy again" and update
that sentence in SKILL.md (the line beginning "After wave 2: reset all mock
behaviours...") so it reads smoothly while preserving the timing/health-probe
details.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71acfc22-8d7a-4ec5-b1ea-b787ba13b64b

📥 Commits

Reviewing files that changed from the base of the PR and between a94ce1c and a3fc426.

📒 Files selected for processing (14)
  • .claude/skills/olla-validate/SKILL.md
  • .claude/skills/olla-validate/areas/anthropic.md
  • .claude/skills/olla-validate/areas/core-routing.md
  • .claude/skills/olla-validate/areas/limits-failures.md
  • .claude/skills/olla-validate/areas/observability.md
  • .claude/skills/olla-validate/areas/openai-api.md
  • .claude/skills/olla-validate/areas/resilience.md
  • AGENTS.md
  • CLAUDE.md
  • test/cmd/ollamock/README.md
  • test/cmd/ollamock/behaviour.go
  • test/cmd/ollamock/handlers.go
  • test/cmd/ollamock/ollamock_test.go
  • test/cmd/ollamock/streaming.go
✅ Files skipped from review due to trivial changes (6)
  • AGENTS.md
  • .claude/skills/olla-validate/areas/anthropic.md
  • .claude/skills/olla-validate/areas/observability.md
  • .claude/skills/olla-validate/areas/core-routing.md
  • test/cmd/ollamock/README.md
  • .claude/skills/olla-validate/areas/resilience.md
🚧 Files skipped from review as they are similar to previous changes (4)
  • .claude/skills/olla-validate/areas/openai-api.md
  • test/cmd/ollamock/ollamock_test.go
  • test/cmd/ollamock/behaviour.go
  • test/cmd/ollamock/streaming.go

@thushan thushan merged commit 1959286 into main Jun 11, 2026
8 checks passed
@thushan thushan deleted the feature/olla-mock branch June 11, 2026 11:32
@coderabbitai coderabbitai Bot mentioned this pull request Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant