Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -428,6 +428,7 @@ For a guided walkthrough of the canonical demo flow, see [docs/demo/fog-harbor-w
- Post-Phase-67 outcome/report/eval generalization audit keeps the queue paused; `docs/plans/post-phase-67-outcome-report-eval-generalization-audit-2026-06-08.md` records that selected-world proof is current for `fog-harbor-east-gate`, `museum-night`, and `library-rain`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this audit.
- Post-Phase-67 runtime-created world eval proof keeps the queue paused; `docs/plans/post-phase-67-runtime-created-world-eval-proof-2026-06-08.md` records that a runtime-created bounded world can pass `eval-world`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this proof.
- Post-Phase-67 decision-trace replay audit keeps the queue paused; `docs/plans/post-phase-67-decision-trace-replay-audit-2026-06-08.md` records current evidence for kernel-level replay and runner-level same-run-directory replay. Do not open Phase 68 from this audit.
- Post-Phase-67 decision-trace eval-ownership boundary keeps the queue paused; `docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that eval-owned decision-trace replay metrics remain unclaimed and that opening implementation without a source-backed blocker would be blueprint drift. Do not open Phase 68 from this boundary note.
- Phase 48 is closed after PR `#382`, issue `#375`, and milestone `Phase 48 - Successor Intake and Boundary Contract Triage`.
- Phase 49 is closed after PR `#395`, issue `#383`, and milestone `Phase 49 - Kernel, Perturbation, and Runtime Contract Hardening`; completed work items were `#384`, `#386`, `#388`, `#390`, `#392`, and `#394`.
- Phase 50 is closed after PR `#402`, issue `#396`, and milestone `Phase 50 - Runtime Orchestration Measurement and Product Boundary`; completed work items were `#397`, `#398`, and `#401`. Phase 50 measured before any `task_id` or worker contract is introduced and kept the private-beta launch hub planning-only for now.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
from __future__ import annotations

from pathlib import Path


BOUNDARY_PATH = Path("docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md")
REPLAY_AUDIT_PATH = Path("docs/plans/post-phase-67-decision-trace-replay-audit-2026-06-08.md")
MINIMUM_LOOP = (
"corpus -> chunks -> graph -> personas -> scenarios -> deterministic runs -> "
"report/claims -> eval"
)


def _read(path: Path) -> str:
assert path.exists(), path
return path.read_text(encoding="utf-8")


def test_decision_trace_eval_ownership_boundary_exists_with_required_sections() -> None:
boundary = _read(BOUNDARY_PATH)
required_sections = [
"# Post-Phase-67 Decision Trace Eval Ownership Boundary",
"## Scope",
"## Current-Code Finding",
"## Source Evidence",
"## Blueprint Alignment Decision",
"## Future Trigger Conditions",
"## Boundaries",
"## Validation Commands",
]
for section in required_sections:
assert section in boundary


def test_eval_ownership_boundary_keeps_metrics_unclaimed_without_phase68() -> None:
boundary = _read(BOUNDARY_PATH)
replay_audit = _read(REPLAY_AUDIT_PATH)
contracts = _read(Path("docs/architecture/contracts.md"))
eval_service = _read(Path("backend/app/evals/service.py"))
decision_kernel = _read(Path("backend/app/decision_kernel/service.py"))
pipeline_tests = _read(Path("backend/tests/test_pipeline.py"))

required_boundary_phrases = [
f"`{MINIMUM_LOOP}`",
"Current-code audit result: eval-owned decision-trace replay metrics remain unclaimed today.",
"This is not a source-backed Phase 68 blocker.",
"Opening an implementation queue for eval-owned decision-trace replay metrics without a new source-backed blocker would be blueprint drift.",
"Do not open Phase 68 from this boundary note.",
"The current queue remains in the formal paused stop-state.",
"`backend/app/evals/service.py` currently has no `decision_trace`, `replay_cache`, or `accepted_from_replay` ownership.",
"`docs/architecture/contracts.md` does not require transfer eval summaries to include decision-trace replay metrics.",
"kernel-level replay and runner-level same-run-directory replay remain the current proven boundary",
"No ADR or `docs/architecture/contracts.md` update is made by this boundary note because this diff does not change a protected-core contract.",
"`status:needs-adr` and unresolved `risk:safety` findings remain merge blockers.",
"Do not present Mirror as a real-world prediction machine.",
"Do not build real-person personas or digital doubles.",
"Do not build political persuasion, hidden surveillance, law-enforcement scoring, hiring, credit, medical, or judicial decision systems.",
]
forbidden_boundary_phrases = [
"Phase 68 is active",
"Phase 68 execution queue is open",
"open a Phase 68 milestone now",
"eval-owned decision-trace replay metrics are implemented",
"changes decision_trace.jsonl shape",
"claims provider-backed replay readiness",
"claims future-world readiness",
]
for phrase in required_boundary_phrases:
assert phrase in boundary, phrase
for phrase in forbidden_boundary_phrases:
assert phrase not in boundary, phrase

assert "Future eval ownership for decision-trace replay metrics remains unclaimed by this audit." not in replay_audit
assert "`docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md`" in replay_audit

for phrase in [
"`decision_trace.jsonl` is the durable v1 decision audit artifact",
"`replay_cache` when the selection is copied from an existing trace entry for the same",
"`accepted_from_replay` for replay-cache reuse.",
]:
assert phrase in contracts

for phrase in ["decision_trace", "replay_cache", "accepted_from_replay"]:
assert phrase not in eval_service

for phrase in [
"replay_entry = self.replay_cache.get(input_hash)",
'provider_mode="replay_cache"',
'validation_status="accepted_from_replay"',
]:
assert phrase in decision_kernel
assert phrase in boundary

assert "test_simulation_replays_from_existing_decision_trace" in pipeline_tests
assert "test_simulation_replays_from_existing_decision_trace" in boundary


def test_current_docs_point_to_eval_ownership_boundary_without_opening_phase68() -> None:
docs = [
Path("README.md"),
Path("docs/plans/current-state-baseline.md"),
Path("docs/plans/phase-execution-queue.md"),
Path("docs/plans/automation-roadmap.md"),
]
required_phrases = [
"`docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md`",
"Post-Phase-67 decision-trace eval-ownership boundary keeps the queue paused",
"eval-owned decision-trace replay metrics remain unclaimed",
"Do not open Phase 68 from this boundary note.",
]
forbidden_phrases = [
"Phase 68 is active",
"Phase 68 execution queue is open",
"`audit-github-queue` reports `ready` for Phase 68",
"milestone `Phase 68",
]
for path in docs:
text = _read(path)
for phrase in required_phrases:
assert phrase in text, f"{path} is missing eval-ownership boundary pointer: {phrase}"
for phrase in forbidden_phrases:
assert phrase not in text, f"{path} opens Phase 68 prematurely: {phrase}"
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def test_decision_trace_replay_audit_is_source_backed_without_phase68() -> None:
"The current queue remains in the formal paused stop-state.",
"does not change `decision_trace.jsonl` shape",
"No ADR or `docs/architecture/contracts.md` update is made by this audit because this diff does not change a protected-core contract.",
"TODO[verify]: Future eval ownership for decision-trace replay metrics remains unclaimed by this audit.",
"`docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that future eval ownership for decision-trace replay metrics remains unclaimed and is not a current Phase 68 blocker.",
"`status:needs-adr` and unresolved `risk:safety` findings remain merge blockers.",
"Do not present Mirror as a real-world prediction machine.",
"Do not build real-person personas or digital doubles.",
Expand Down
1 change: 1 addition & 0 deletions docs/plans/automation-roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Day 0 bootstrap is complete, Phase 5 closeout is complete, Phase 6 closeout is c
- Post-Phase-67 outcome/report/eval generalization audit keeps the queue paused; `docs/plans/post-phase-67-outcome-report-eval-generalization-audit-2026-06-08.md` records that selected-world proof is current for `fog-harbor-east-gate`, `museum-night`, and `library-rain`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this audit.
- Post-Phase-67 runtime-created world eval proof keeps the queue paused; `docs/plans/post-phase-67-runtime-created-world-eval-proof-2026-06-08.md` records that a runtime-created bounded world can pass `eval-world`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this proof.
- Post-Phase-67 decision-trace replay audit keeps the queue paused; `docs/plans/post-phase-67-decision-trace-replay-audit-2026-06-08.md` records current evidence for kernel-level replay and runner-level same-run-directory replay. Do not open Phase 68 from this audit.
- Post-Phase-67 decision-trace eval-ownership boundary keeps the queue paused; `docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that eval-owned decision-trace replay metrics remain unclaimed and that opening implementation without a source-backed blocker would be blueprint drift. Do not open Phase 68 from this boundary note.
- Phase 1 and Phase 2 gates are closed.
- Phase 3 is closed locally and in GitHub.
- Phase 3 exit issue `#4` is closed and milestone `Phase 3 - Eval/UI/Demo` is closed.
Expand Down
2 changes: 2 additions & 0 deletions docs/plans/current-state-baseline.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ Post-Phase-67 runtime-created world eval proof keeps the queue paused; `docs/pla

Post-Phase-67 decision-trace replay audit keeps the queue paused; `docs/plans/post-phase-67-decision-trace-replay-audit-2026-06-08.md` records current evidence for kernel-level replay and runner-level same-run-directory replay. Do not open Phase 68 from this audit.

Post-Phase-67 decision-trace eval-ownership boundary keeps the queue paused; `docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that eval-owned decision-trace replay metrics remain unclaimed and that opening implementation without a source-backed blocker would be blueprint drift. Do not open Phase 68 from this boundary note.

## Snapshot

- Local quality baseline:
Expand Down
1 change: 1 addition & 0 deletions docs/plans/phase-execution-queue.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ Phase 67 - Blueprint Calibration and Minimum-Loop Value Gate
- Post-Phase-67 outcome/report/eval generalization audit keeps the queue paused; `docs/plans/post-phase-67-outcome-report-eval-generalization-audit-2026-06-08.md` records that selected-world proof is current for `fog-harbor-east-gate`, `museum-night`, and `library-rain`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this audit.
- Post-Phase-67 runtime-created world eval proof keeps the queue paused; `docs/plans/post-phase-67-runtime-created-world-eval-proof-2026-06-08.md` records that a runtime-created bounded world can pass `eval-world`, does not claim future-world readiness, and does not prove a Phase 68 blocker. Do not open Phase 68 from this proof.
- Post-Phase-67 decision-trace replay audit keeps the queue paused; `docs/plans/post-phase-67-decision-trace-replay-audit-2026-06-08.md` records current evidence for kernel-level replay and runner-level same-run-directory replay. Do not open Phase 68 from this audit.
- Post-Phase-67 decision-trace eval-ownership boundary keeps the queue paused; `docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that eval-owned decision-trace replay metrics remain unclaimed and that opening implementation without a source-backed blocker would be blueprint drift. Do not open Phase 68 from this boundary note.
- If a future minimum-loop audit finds a contract gap, split a separate protected-core contract issue before changing schema, scenario DSL, claim labels, run trace shape, or artifact layout.

## Phase 66 Closed Queue
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Post-Phase-67 Decision Trace Eval Ownership Boundary

Date: 2026-06-08

## Scope

This boundary note closes the eval-ownership follow-up left by the Post-Phase-67 decision-trace replay audit.

It asks only whether future eval ownership for decision-trace replay metrics is a current source-backed blocker. It does not implement new eval metrics.

The minimum loop remains:

```text
corpus -> chunks -> graph -> personas -> scenarios -> deterministic runs -> report/claims -> eval
```

The queue shorthand is `corpus -> chunks -> graph -> personas -> scenarios -> deterministic runs -> report/claims -> eval`.

## Current-Code Finding

Current-code audit result: eval-owned decision-trace replay metrics remain unclaimed today.

This is not a source-backed Phase 68 blocker.

The current queue remains in the formal paused stop-state.

`backend/app/evals/service.py` currently has no `decision_trace`, `replay_cache`, or `accepted_from_replay` ownership.

`docs/architecture/contracts.md` does not require transfer eval summaries to include decision-trace replay metrics.

## Source Evidence

- `docs/architecture/contracts.md` states that `decision_trace.jsonl` is the durable v1 decision audit artifact.
- `docs/architecture/contracts.md` defines `replay_cache` when the selection is copied from an existing trace entry for the same `input_hash`.
- `docs/architecture/contracts.md` defines `accepted_from_replay` for replay-cache reuse.
- `backend/app/decision_kernel/service.py` looks up `replay_entry = self.replay_cache.get(input_hash)`.
- `backend/app/decision_kernel/service.py` finalizes cache hits with `provider_mode="replay_cache"` and `validation_status="accepted_from_replay"`.
- `backend/tests/test_pipeline.py` includes `test_simulation_replays_from_existing_decision_trace`.

## Blueprint Alignment Decision

Do not open Phase 68 from this boundary note.

Opening an implementation queue for eval-owned decision-trace replay metrics without a new source-backed blocker would be blueprint drift.

The correct current boundary is narrower: kernel-level replay and runner-level same-run-directory replay remain the current proven boundary, while eval-owned decision-trace replay metrics remain future contract work only if a later audit proves a minimum-loop gap.

No successor queue is opened by this boundary note.

## Future Trigger Conditions

Before eval-owned decision-trace replay metrics can become implementation work, a future reviewed audit must identify:

- the exact metric eval would own, such as trace existence, replay consistency, cache-hit rate, provider-call avoidance, privacy redaction, or runtime-node coverage
- the eval command and artifact scope, such as `eval-world`, `eval-transfer`, generated runtime sessions, or private-beta-only validation
- the source-backed minimum-loop gap or protected-core contract blocker that makes the metric necessary now
- whether `eval/summary.json` becomes a stable metric contract and therefore needs `docs/architecture/contracts.md` and possibly an ADR

## Boundaries

- This boundary note does not change `decision_trace.jsonl` shape.
- This boundary note does not change schema, scenario DSL, perturbation payload schema, decision schema, claim labels, report claim `evidence_ids`, run trace shape, compare artifact shape, session/node manifest shape, public demo artifact layout, plugin MCP contract, route ownership, or artifact layout.
- This boundary note does not assert provider-backed replay readiness.
- This boundary note does not assert future-world readiness.
- No ADR or `docs/architecture/contracts.md` update is made by this boundary note because this diff does not change a protected-core contract.
- `status:needs-adr` and unresolved `risk:safety` findings remain merge blockers.
- Do not present Mirror as a real-world prediction machine.
- Do not build real-person personas or digital doubles.
- Do not build political persuasion, hidden surveillance, law-enforcement scoring, hiring, credit, medical, or judicial decision systems.

## Validation Commands

- `python -m pytest backend/tests/test_post_phase67_decision_trace_eval_ownership_boundary.py -q`
- `python -m pytest backend/tests/test_post_phase67_decision_trace_replay_audit.py backend/tests/test_post_phase67_decision_trace_eval_ownership_boundary.py -q`
- `python scripts/check_no_secrets.py`
- `python -m backend.app.cli audit-github-queue --repo YSCJRH/mirror-sim`
- `git diff --check`
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ This audit does not prove a protected-core blocker. It proves that the current d

No successor queue is opened by this audit.

TODO[verify]: Future eval ownership for decision-trace replay metrics remains unclaimed by this audit.
Follow-up boundary: `docs/plans/post-phase-67-decision-trace-eval-ownership-boundary-2026-06-08.md` records that future eval ownership for decision-trace replay metrics remains unclaimed and is not a current Phase 68 blocker.

## Boundaries

Expand Down
Loading