Skip to content

feat(ci): mutation testing on policy, validation, and orchestrator (Stryker + mutmut) (CA-04) #255

@krokoko

Description

@krokoko

This is a finding from https://github.com/krokoko/cairn (action item CA-04).

Component

Tooling / CI

Describe the feature

Add mutation testing on the high-criticality modules: Stryker for TypeScript (validation, orchestrator) and mutmut for Python (policy.py). Run it as a nightly (slow-tier) job. This doubles as the AI005/AI009 gate and as the oracle-strength measurement (the "Generator + Mutant" generator-evaluator variant).

Use case

Coverage is collected but tells us only that code ran, not that the tests would catch a regression. With no mutation testing, AI005 (tests mirroring implementation) and AI009 (happy-path-only) are essentially invisible to automation — the single highest-risk gap for a platform whose product is agents writing code. Mutation testing measures suite adequacy directly: a low mutation-kill score means green-but-meaningless tests on exactly the modules where correctness matters most (authorization, input validation, task orchestration).

Proposed solution

  1. Add Stryker to cdk, scoped to handlers/shared/validation.ts and the orchestrator modules.
  2. Add mutmut to agent/, scoped to policy.py.
  3. Run both in a scheduled nightly workflow (not on the PR critical path — it's slow).
  4. Surface the mutation-kill score; set an initial target and ratchet up.

Acceptance criteria

  • Stryker configured for validation + orchestrator (TS); mutmut configured for policy.py (Py).
  • A nightly GitHub Actions workflow runs both and reports mutation-kill scores.
  • An initial mutation-kill target is recorded for each module (to be ratcheted).
  • Results are captured as artifacts so the score is trackable as a per-PR/over-time delta.

Other information

Source reports: verification-report.md (oracle-rot watch — "Mutation testing (absent) is the standard defense"), verification-strategy.md (Phase 2 #9; Generator+Mutant), ai-smells-gates-report.md (AI005). Effort: M. Depends on CA-01 (coverage floor) and CA-02 (property tests) landing first. Per ADR-003 this issue needs the approved label before work begins.

Metadata

Metadata

Assignees

No one assigned

    Labels

    validation-loopTasks related to improve the validation loop for ABCA's codebase

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions