feat: olla mocking skills for claude by thushan · Pull Request #174 · thushan/olla

thushan · 2026-06-11T10:26:07Z

Our nightly / regression scripts used in tensorfoundry.io products for testing ported to Olla - and tweaked.

/olla-validate --quick - 5–10 min gate after major changes
/olla-validate --nightly - multi-hour pre-release gate (chaos, soak, Sherpa pass, forced-translation pass, benchmarks)
No flag - prompts for the depth

Summary by CodeRabbit

Documentation
- Added a comprehensive Validation Harness guide and area runbooks (routing, resilience, observability, OpenAI/Anthropic) with quick/nightly workflows and reporting.
New Features
- Introduced a local multi-protocol mock backend for end-to-end validation with configurable fault-injection and control endpoints.
- Added a command to run the mock backend for local validation.
Tests
- Added extensive test coverage for streaming, non-streaming, error modes and aggregate stats.
- Included validation configs for general and tight-limits scenarios.
Chores
- Wired validation harness results reporting and test run bookkeeping.

…jection

…e gating

…haiku for token efficiency

coderabbitai · 2026-06-11T10:26:31Z

Walkthrough

This PR adds an agent-driven end-to-end validation harness: SKILL orchestration, per-area validation checklists, a multi-protocol mock LLM backend (ollamock) with runtime fault injection and streaming support, test suite and harness configs, and documentation + mkdocs navigation.

Changes

End-to-End Validation Harness

Layer / File(s)	Summary
Validation Skill Orchestration `.claude/skills/olla-validate/SKILL.md`	Eight-phase orchestration: static gate, build, fleet boot, Wave 1 parallel read-only agents, Wave 2 resilience (fault injection allowed), nightly sequential extensions (limits, translation-forced Anthropic, Sherpa quick-check, soak+chaos, optional bench), mandatory teardown, and report aggregation with verdict gating. Defines agent roles, JSON report contract, and escalation rules.
Validation Area Checklists `.claude/skills/olla-validate/areas/*.md`	Six area runbooks (Anthropic, OpenAI API, core-routing, limits-failures, observability, resilience) describe read-only quick checks and expanded nightly scenarios covering streaming, routing, limits, failover, stats, and translation-forced validation.
Mock Backend Behaviour & Control Plane `test/cmd/ollamock/behaviour.go`	Behaviour config and Mode enum (`ok`,`error`,`flaky`,`hang`,`slow`), deterministic seeded RNG, partial-PATCH merge semantics, request stats store, and HTTP control endpoints for behaviour, reset, and stats.
Mock Backend Handlers & Routes `test/cmd/ollamock/handlers.go`	HTTP mux wiring, request wrapper that records per-path stats and sets `X-Ollamock-Instance`, applyBehaviour short-circuiting (health gating, hang, deterministic/flaky errors, latency, malformed JSON), and model-listing handlers for Ollama/LM Studio/OpenAI/Lemonade plus health/root endpoints.
Streaming Implementation `test/cmd/ollamock/streaming.go`	Protocol-specific streaming: Ollama NDJSON (done:true record), OpenAI SSE with `data: [DONE]`, Anthropic event sequence; TTFT/TPS pacing, mid-stream truncation (DropMidStream), abrupt-close attempts, and SSE/NDJSON helpers.
Main CLI and Process Management `test/cmd/ollamock/main.go`	Server entrypoint with CLI flags (addr, name, models, ttft-ms, tps, stream-chunks), parseModels helper, background ListenAndServe, signal handling, graceful shutdown.
Tests and README `test/cmd/ollamock/ollamock_test.go`, `test/cmd/ollamock/README.md`	httptest suite covering model listings, non-stream and streaming flows (SSE/NDJSON parsing and termination), behaviour injection tests, stats tests, and documentation for running, flags, endpoints, behaviour modes, control-plane API, and curl examples.
Harness Configuration `test/validate/config.validate.yaml`, `test/validate/config.validate.limits.yaml`	Main harness config with seven static mock endpoints, discovery/model-registry/unifier settings, Anthropic translator enabled (passthrough), and a tight-limits config for request/size/rate boundary testing.
Documentation & Navigation `CLAUDE.md`, `docs/content/development/validation.md`, `docs/content/development/testing.md`, `docs/mkdocs.yml`, `AGENTS.md`	CLAUDE.md updated with validation harness docs; new validation.md explains harness topology, modes, and ollamock; testing.md tree updated; mkdocs.yml adds nav entry; AGENTS.md now references CLAUDE.md.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

enhancement, documentation, experimental

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: olla mocking skills for claude' directly addresses the main change: adding Claude skills for Olla mocking/validation harness with multiple run modes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/olla-mock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

test/cmd/ollamock/streaming.go (2)

580-601: 💤 Low value

Consider using context.Context directly rather than interface.

The functions applyTTFT and applyTPS accept interface{ Done() <-chan struct{} } to access the Done() channel. Since context.Context is the standard interface for cancellation and all callers pass r.Context(), using context.Context directly would be clearer and more idiomatic.

♻️ Proposed change

-func applyTTFT(ctx interface{ Done() <-chan struct{} }, ttftMS int) {
+func applyTTFT(ctx context.Context, ttftMS int) {
 	if ttftMS <= 0 {
 		return
 	}
 	select {
 	case <-ctx.Done():
 	case <-time.After(time.Duration(ttftMS) * time.Millisecond):
 	}
 }

-func applyTPS(ctx interface{ Done() <-chan struct{} }, tps int) {
+func applyTPS(ctx context.Context, tps int) {
 	if tps <= 0 {
 		return
 	}
 	delay := time.Duration(1000/tps) * time.Millisecond
 	select {
 	case <-ctx.Done():
 	case <-time.After(delay):
 	}
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cmd/ollamock/streaming.go` around lines 580 - 601, Change the parameter
type for applyTTFT and applyTPS from the generic interface{ Done() <-chan
struct{} } to the concrete context.Context: update func signatures applyTTFT(ctx
context.Context, ttftMS int) and applyTPS(ctx context.Context, tps int), import
the context package, and ensure all callers passing r.Context() keep working (no
other behavior changes needed since context.Context exposes Done()); this makes
the code idiomatic and clearer about cancellation semantics.

18-32: ⚡ Quick win

Remove dead code and misleading comment.

Lines 20–26 contain unused code (two decoder declarations that are never used). The comment on lines 27–28 mentions "re-parse permissively" but there's only a single parse operation. This appears to be leftover experimental code.

♻️ Proposed cleanup

 func parseInferenceRequest(r *http.Request) (inferenceRequest, error) {
 	var req inferenceRequest
-	dec := json.NewDecoder(r.Body)
-	dec.DisallowUnknownFields()
-
-	// We only need the two top-level fields; unknown fields come from real
-	// clients that send messages, tools, temperature etc. Allow them silently.
-	dec2 := json.NewDecoder(r.Body)
-	_ = dec2
-	// Re-parse permissively — DisallowUnknownFields was too strict for real
-	// client payloads. Use a map-based approach instead.
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil && err != io.EOF {
 		return inferenceRequest{}, err
 	}
 	return req, nil
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/cmd/ollamock/streaming.go` around lines 18 - 32, In
parseInferenceRequest: remove the dead/unused variables dec and dec2 and the
misleading comment about "re-parse permissively"; instead keep a single
permissive decode using json.NewDecoder(r.Body).Decode(&req) (with the same err
!= io.EOF handling) and drop the DisallowUnknownFields call so the function only
performs one decode pass into inferenceRequest.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.claude/skills/olla-validate/areas/observability.md:
- Around line 13-15: Update the endpoint name list used by the GET
/internal/status/endpoints check to match the SKILL.md topology: replace
mock-vllm-a with mock-vllm-e, mock-litellm-b with mock-litellm-f, and
mock-llamacpp-d with mock-llamacpp-g so the expected seven endpoints
(mock-openai-a/b, mock-vllm-e, mock-litellm-f, mock-ollama-c, mock-lmstudio-d,
mock-llamacpp-g) match the actual /internal/status/endpoints output.

In @.claude/skills/olla-validate/areas/resilience.md:
- Around line 89-91: Update the "Final state assertion" text so the mock count
matches the topology: replace the phrase "all four mocks" with "all seven mocks"
(the assertion that all mocks report default behaviour via `GET
/_mock/behaviour` and that all 7 endpoints are healthy and `/internal/status`
returns 200 must check seven mocks). Ensure any nearby references in the same
paragraph to mock count are updated to "seven" so the final assertion
consistently verifies all seven mocks.

In @.claude/skills/olla-validate/SKILL.md:
- Line 178: Documentation and test instructions incorrectly reference "four
mocks" for the reset and verification steps; update all occurrences of the reset
instruction and final assertion that say "four mocks" (e.g., the POST
/_mock/reset step and the final confirmation sentence) to reflect seven mocks
(ports 19431–19437) so both the orchestration and area validation steps reset
and verify all seven mocks instead of four. Ensure the wording for POST
/_mock/reset and the final assertions explicitly state "seven mocks (ports
19431–19437)".

---

Nitpick comments:
In `@test/cmd/ollamock/streaming.go`:
- Around line 580-601: Change the parameter type for applyTTFT and applyTPS from
the generic interface{ Done() <-chan struct{} } to the concrete context.Context:
update func signatures applyTTFT(ctx context.Context, ttftMS int) and
applyTPS(ctx context.Context, tps int), import the context package, and ensure
all callers passing r.Context() keep working (no other behavior changes needed
since context.Context exposes Done()); this makes the code idiomatic and clearer
about cancellation semantics.
- Around line 18-32: In parseInferenceRequest: remove the dead/unused variables
dec and dec2 and the misleading comment about "re-parse permissively"; instead
keep a single permissive decode using json.NewDecoder(r.Body).Decode(&req) (with
the same err != io.EOF handling) and drop the DisallowUnknownFields call so the
function only performs one decode pass into inferenceRequest.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa233c9a-e858-43ca-97c8-30aa4102245a

📥 Commits

Reviewing files that changed from the base of the PR and between 455e501 and a94ce1c.

📒 Files selected for processing (19)

.claude/skills/olla-validate/SKILL.md
.claude/skills/olla-validate/areas/anthropic.md
.claude/skills/olla-validate/areas/core-routing.md
.claude/skills/olla-validate/areas/limits-failures.md
.claude/skills/olla-validate/areas/observability.md
.claude/skills/olla-validate/areas/openai-api.md
.claude/skills/olla-validate/areas/resilience.md
CLAUDE.md
docs/content/development/testing.md
docs/content/development/validation.md
docs/mkdocs.yml
test/cmd/ollamock/README.md
test/cmd/ollamock/behaviour.go
test/cmd/ollamock/handlers.go
test/cmd/ollamock/main.go
test/cmd/ollamock/ollamock_test.go
test/cmd/ollamock/streaming.go
test/validate/config.validate.limits.yaml
test/validate/config.validate.yaml

coderabbitai

🧹 Nitpick comments (1)

.claude/skills/olla-validate/SKILL.md (1)

188-190: 💤 Low value

Awkward phrasing: "return to healthy".

"return to healthy" is grammatically awkward. Consider "return to health" or "become healthy again".

✏️ Suggested rewrite

-After wave 2: reset all mock behaviours, re-confirm all endpoints return to
-healthy within 60s (this is itself a recovery assertion - record it; health
+After wave 2: reset all mock behaviours, re-confirm all endpoints return to
+health within 60s (this is itself a recovery assertion - record it; health
 probes tick globally every 30s regardless of per-endpoint check_interval).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.claude/skills/olla-validate/SKILL.md around lines 188 - 190, The phrase
"return to healthy" in the sentence starting with "After wave 2: reset all mock
behaviours, re-confirm all endpoints return to healthy within 60s..." is
awkward; change it to a clearer phrasing such as "return to health", "become
healthy again", or "are healthy again" and update that sentence in SKILL.md (the
line beginning "After wave 2: reset all mock behaviours...") so it reads
smoothly while preserving the timing/health-probe details.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.claude/skills/olla-validate/SKILL.md:
- Around line 188-190: The phrase "return to healthy" in the sentence starting
with "After wave 2: reset all mock behaviours, re-confirm all endpoints return
to healthy within 60s..." is awkward; change it to a clearer phrasing such as
"return to health", "become healthy again", or "are healthy again" and update
that sentence in SKILL.md (the line beginning "After wave 2: reset all mock
behaviours...") so it reads smoothly while preserving the timing/health-probe
details.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71acfc22-8d7a-4ec5-b1ea-b787ba13b64b

📥 Commits

Reviewing files that changed from the base of the PR and between a94ce1c and a3fc426.

📒 Files selected for processing (14)

.claude/skills/olla-validate/SKILL.md
.claude/skills/olla-validate/areas/anthropic.md
.claude/skills/olla-validate/areas/core-routing.md
.claude/skills/olla-validate/areas/limits-failures.md
.claude/skills/olla-validate/areas/observability.md
.claude/skills/olla-validate/areas/openai-api.md
.claude/skills/olla-validate/areas/resilience.md
AGENTS.md
CLAUDE.md
test/cmd/ollamock/README.md
test/cmd/ollamock/behaviour.go
test/cmd/ollamock/handlers.go
test/cmd/ollamock/ollamock_test.go
test/cmd/ollamock/streaming.go

✅ Files skipped from review due to trivial changes (6)

AGENTS.md
.claude/skills/olla-validate/areas/anthropic.md
.claude/skills/olla-validate/areas/observability.md
.claude/skills/olla-validate/areas/core-routing.md
test/cmd/ollamock/README.md
.claude/skills/olla-validate/areas/resilience.md

🚧 Files skipped from review as they are similar to previous changes (4)

.claude/skills/olla-validate/areas/openai-api.md
test/cmd/ollamock/ollamock_test.go
test/cmd/ollamock/behaviour.go
test/cmd/ollamock/streaming.go

thushan added 5 commits June 11, 2026 20:15

add ollamock, a multi-protocol mock LLM backend with runtime fault in…

f54668d

…jection

wire a seven-endpoint ollamock fleet into the validation harness configs

0c42046

add the olla-validate skill for agent-driven quick and nightly releas…

78bb1e9

…e gating

document the validation harness in the developer docs and CLAUDE.md

7dcda26

pin olla-validate to sonnet and split area agents between sonnet and …

a94ce1c

…haiku for token efficiency

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread .claude/skills/olla-validate/areas/observability.md Outdated

Comment thread .claude/skills/olla-validate/areas/resilience.md Outdated

Comment thread .claude/skills/olla-validate/SKILL.md Outdated

thushan added 2 commits June 11, 2026 20:59

fix alignment

3479915

coderabbit fixes.

a3fc426

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

thushan merged commit 1959286 into main Jun 11, 2026
8 checks passed

thushan deleted the feature/olla-mock branch June 11, 2026 11:32

coderabbitai Bot mentioned this pull request Jun 13, 2026

rollup: June 2026 #176

Merged

chenrui333 mentioned this pull request Jun 19, 2026

olla 0.0.28 chenrui333/homebrew-tap#8236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: olla mocking skills for claude#174

feat: olla mocking skills for claude#174
thushan merged 7 commits into
mainfrom
feature/olla-mock

thushan commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thushan commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thushan commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading