This is a finding from https://github.com/krokoko/cairn (action item CA-04).
Component
Tooling / CI
Describe the feature
Add mutation testing on the high-criticality modules: Stryker for TypeScript (validation, orchestrator) and mutmut for Python (policy.py). Run it as a nightly (slow-tier) job. This doubles as the AI005/AI009 gate and as the oracle-strength measurement (the "Generator + Mutant" generator-evaluator variant).
Use case
Coverage is collected but tells us only that code ran, not that the tests would catch a regression. With no mutation testing, AI005 (tests mirroring implementation) and AI009 (happy-path-only) are essentially invisible to automation — the single highest-risk gap for a platform whose product is agents writing code. Mutation testing measures suite adequacy directly: a low mutation-kill score means green-but-meaningless tests on exactly the modules where correctness matters most (authorization, input validation, task orchestration).
Proposed solution
- Add Stryker to
cdk, scoped to handlers/shared/validation.ts and the orchestrator modules.
- Add mutmut to
agent/, scoped to policy.py.
- Run both in a scheduled nightly workflow (not on the PR critical path — it's slow).
- Surface the mutation-kill score; set an initial target and ratchet up.
Acceptance criteria
Other information
Source reports: verification-report.md (oracle-rot watch — "Mutation testing (absent) is the standard defense"), verification-strategy.md (Phase 2 #9; Generator+Mutant), ai-smells-gates-report.md (AI005). Effort: M. Depends on CA-01 (coverage floor) and CA-02 (property tests) landing first. Per ADR-003 this issue needs the approved label before work begins.
Component
Tooling / CI
Describe the feature
Add mutation testing on the high-criticality modules: Stryker for TypeScript (
validation,orchestrator) and mutmut for Python (policy.py). Run it as a nightly (slow-tier) job. This doubles as the AI005/AI009 gate and as the oracle-strength measurement (the "Generator + Mutant" generator-evaluator variant).Use case
Coverage is collected but tells us only that code ran, not that the tests would catch a regression. With no mutation testing, AI005 (tests mirroring implementation) and AI009 (happy-path-only) are essentially invisible to automation — the single highest-risk gap for a platform whose product is agents writing code. Mutation testing measures suite adequacy directly: a low mutation-kill score means green-but-meaningless tests on exactly the modules where correctness matters most (authorization, input validation, task orchestration).
Proposed solution
cdk, scoped tohandlers/shared/validation.tsand the orchestrator modules.agent/, scoped topolicy.py.Acceptance criteria
validation+orchestrator(TS); mutmut configured forpolicy.py(Py).Other information
Source reports:
verification-report.md(oracle-rot watch — "Mutation testing (absent) is the standard defense"),verification-strategy.md(Phase 2 #9; Generator+Mutant),ai-smells-gates-report.md(AI005). Effort: M. Depends on CA-01 (coverage floor) and CA-02 (property tests) landing first. Per ADR-003 this issue needs theapprovedlabel before work begins.