feat(ci): mutation testing on policy, validation, and orchestrator (Stryker + mutmut) (CA-04)

> **This is a finding from https://github.com/krokoko/cairn** (action item **CA-04**).

### Component

Tooling / CI

### Describe the feature

Add **mutation testing** on the high-criticality modules: **Stryker** for TypeScript (`validation`, `orchestrator`) and **mutmut** for Python (`policy.py`). Run it as a **nightly** (slow-tier) job. This doubles as the AI005/AI009 gate and as the oracle-strength measurement (the "Generator + Mutant" generator-evaluator variant).

### Use case

Coverage is collected but tells us only that code *ran*, not that the tests would *catch a regression*. With no mutation testing, AI005 (tests mirroring implementation) and AI009 (happy-path-only) are essentially invisible to automation — the single highest-risk gap for a platform whose product is agents writing code. Mutation testing measures suite *adequacy* directly: a low mutation-kill score means green-but-meaningless tests on exactly the modules where correctness matters most (authorization, input validation, task orchestration).

### Proposed solution

1. Add Stryker to `cdk`, scoped to `handlers/shared/validation.ts` and the orchestrator modules.
2. Add mutmut to `agent/`, scoped to `policy.py`.
3. Run both in a scheduled nightly workflow (not on the PR critical path — it's slow).
4. Surface the mutation-kill score; set an initial target and ratchet up.

### Acceptance criteria

- [ ] Stryker configured for `validation` + `orchestrator` (TS); mutmut configured for `policy.py` (Py).
- [ ] A nightly GitHub Actions workflow runs both and reports mutation-kill scores.
- [ ] An initial mutation-kill target is recorded for each module (to be ratcheted).
- [ ] Results are captured as artifacts so the score is trackable as a per-PR/over-time delta.

### Other information

Source reports: `verification-report.md` (oracle-rot watch — "Mutation testing (absent) is the standard defense"), `verification-strategy.md` (Phase 2 #9; Generator+Mutant), `ai-smells-gates-report.md` (AI005). Effort: **M**. Depends on CA-01 (coverage floor) and CA-02 (property tests) landing first. Per ADR-003 this issue needs the `approved` label before work begins.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): mutation testing on policy, validation, and orchestrator (Stryker + mutmut) (CA-04) #255

Component

Describe the feature

Use case

Proposed solution

Acceptance criteria

Other information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(ci): mutation testing on policy, validation, and orchestrator (Stryker + mutmut) (CA-04) #255

Description

Component

Describe the feature

Use case

Proposed solution

Acceptance criteria

Other information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions