Skip to content

feat(persona): auto-migrate legacy persona-home layout instead of refusing boot (e9f50a36 A2)#1551

Open
joelteply wants to merge 1 commit into
feat/boot-status-honest-startupfrom
feat/persona-home-auto-migrate
Open

feat(persona): auto-migrate legacy persona-home layout instead of refusing boot (e9f50a36 A2)#1551
joelteply wants to merge 1 commit into
feat/boot-status-honest-startupfrom
feat/persona-home-auto-migrate

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Summary

Card e9f50a36 slice A2. Stacked on #1550 (boot_status seam — review/merge that first).

The pre-Slice-4 persona-home layout <root>/personas/<name>/airc/ was hard-error-on-detect. The error message included the exact mv command, but the operator still had to copy-paste and re-run. Joel hit this earlier today migrating Paige. This PR makes the substrate do the migration itself, then tell the operator via the boot_status seam.

Concrete case this fixes

[ before this PR ]
$ npm start
…
Error: legacy persona home detected at /Users/joel/.continuum/personas/Paige/airc —
Slice 4 of #142 moved personas under `citizens/personas/<name>/airc/`. To migrate
this persona's identity AND keep its peer_id stable, run:

  mkdir -p /Users/joel/.continuum/citizens/personas/Paige && \
  mv /Users/joel/.continuum/personas/Paige/airc /Users/joel/.continuum/citizens/personas/Paige/airc

Then re-run.
^C
$ mkdir -p ... && mv ...   # human in the loop
$ npm start                # finally boots
[ after this PR ]
$ npm start
[continuum-core-server] persona-home: ⚠ migrated Paige: /Users/joel/.continuum/personas/Paige/airc → /Users/joel/.continuum/citizens/personas/Paige/airc
…
[continuum-core-server] boot-mode: ✓ full-citizen (...)

Next boot, with the migration already done, no persona-home: line prints at all (Ok-by-omission via the no-legacy-detected branch).

Why this isn't a [[no-fallbacks-ever]] violation

The doctrine forbids silent capability degradation. Auto-migration is:

  • Explicit: surfaces via a boot_status("persona-home", Degraded, ...) line at boot AND fires a tracing::info!(target="boot.status", ...) event the JSONL probe sink captures.
  • Idempotent: a re-run after migration finds nothing to do (no legacy path → no action).
  • 1:1 unambiguous: only one legacy → new mapping exists; no judgment call.
  • Atomic: tokio::fs::rename on the same filesystem; the persona's identity.key, admission_state.sqlite, and every other piece of airc state move together.

The Degraded kind (not Ok) is deliberate — it signals "this boot fixed something" so an operator scanning the boot summary sees that an action was taken.

Safety branch preserved

If BOTH legacy AND new paths exist with content (partial manual migration / backup restore conflict), the substrate refuses with a typed LegacyMigrationConflict carrying exact recovery commands. The empty-destination case proceeds (the substrate removes the empty placeholder and renames in) — that's idempotent for the "operator created the parent skeleton then ran the substrate" case.

Best-effort cleanup of empty parent dirs (<root>/personas/<name>/, then <root>/personas/ itself) uses tokio::fs::remove_dir's built-in refusal to remove non-empty directories as a "did anyone else have content here?" race-free check.

What changed

  • crates/airc-cli/src/persona/airc_runtime.rs:
    • Replaced return Err(LegacyLayoutDetected { ... }) with migrate_legacy_persona_home(legacy, home, agent_name).
    • New helper performs the migration (with empty-destination handling + parent-dir cleanup) and reports via boot_status.
    • New error variants: LegacyMigrationConflict (both paths non-empty), LegacyMigrationIoError (rename / mkdir failed).
    • Removed LegacyLayoutDetected variant — replaced by the conflict variant for the genuinely-ambiguous case.

Test plan

  • cargo test -p continuum-core --lib persona::airc_runtime::tests — 4/4 green (existing layout test + 3 new migration tests):
    • migrate_moves_legacy_persona_home_into_citizens_layout — happy path; legacy gone, new exists, seed file preserved.
    • migrate_refuses_when_both_paths_have_content — conflict branch; both files untouched, typed error returned.
    • migrate_proceeds_when_new_path_exists_but_is_empty — idempotency; empty destination removed, rename proceeds.
  • cargo build -p continuum-core clean.

Composition

…using boot

Card e9f50a36 (slice A2) — follows the `boot_status` seam from PR #1550.

The pre-Slice-4 persona-home layout `<root>/personas/<name>/airc/`
was hard-error-on-detect. The error message named the exact `mv`
command, but the operator still had to copy-paste it and re-run.
Joel hit this earlier today migrating Paige, and the same paper cut
exists for every persona under the legacy layout on every fresh
checkout.

Auto-migration is safe because:
- The mapping is 1:1 unambiguous (only one legacy → new shape).
- The operation is `tokio::fs::rename` — atomic on the same
  filesystem, preserves the persona's `identity.key` +
  `admission_state.sqlite` + every other piece of airc state.
- The substrate now REPORTS the action via
  `boot_status("persona-home", Degraded, "migrated <name>: legacy
  → citizens/")` so the operator sees it happened (no silent
  fallback per `[[no-fallbacks-ever]]`).

The Degraded kind (not Ok) is intentional: it tells the operator
"this boot fixed something the prior layout left behind." Next
boot, with the migration already done, the bootstrap path
short-circuits at the `try_exists(legacy) → false` check and
prints no boot_status line at all.

Safety branch preserved: if BOTH `legacy` AND `new` exist with
content (partial manual migration / backup restore conflict), the
substrate refuses with a typed `LegacyMigrationConflict` carrying
exact recovery commands. Empty-destination case proceeds (the
substrate removes the empty placeholder and renames in).

Best-effort cleanup of empty parent dirs (`<root>/personas/<name>/`,
then `<root>/personas/` itself) leans on `tokio::fs::remove_dir`'s
built-in refusal to remove non-empty directories — perfect "did
anyone else have content here?" check without a stat race.

Removed: `LegacyLayoutDetected` enum variant + the corresponding
hard-error test. Added: `LegacyMigrationConflict` +
`LegacyMigrationIoError` variants for the genuinely-ambiguous case,
plus three tokio tests covering the success path, the conflict
branch, and the empty-destination idempotency case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@joelteply joelteply left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review per [[agent-review-as-acceptable-approval]]. Default posture: BLOCK.

1. Correctness. tokio::fs::rename delegates to libc rename(2), which is atomic on the SAME filesystem and returns EXDEV cross-volume — it does NOT silently copy-then-delete. So the worst case is a clean error surfaced via LegacyMigrationIoError, not silent corruption. Acceptable.

2. Architecture. The "not a fallback" argument holds: a fallback hides degradation; this announces it via boot_status("persona-home", Degraded, ...). It's an explicit, logged, idempotent schema migration — the same shape as a database migration on boot. Consistent with [[reliable-startup-substrate-refuses-to-lie]].

3. Traits / API. String allocation for agent_name in error variants is correct — once-per-boot path, thiserror Display impl needs owned data, and &str would force lifetime annotations through the error type into every caller. Right tradeoff.

4. Modularity. A PersonaHomePath newtype with .migrate_from_legacy() would be cleaner, but doesn't yet exist; introducing it here would expand scope. Acceptable as a follow-up.

5. Speed. N/A — boot path.

6. Intel-Mac viability. APFS rename(2) works cross-directory within the same volume. Same-volume invariant holds since both legacy and new are under <continuum_root>. Confirmed.

7. Elegance. ~120 LoC for helper + 3 tests is proportionate. Degraded (not Ok) for successful migration is correct — operator MUST see "we fixed something" so they can audit.

Concerns verified:

  • Cross-volume: EXDEV → typed error, not corruption. ✓
  • Conflict message recovery commands: names exact rm -rf commands for both branches. ✓
  • Test coverage replacement: new happy-path test asserts legacy gone + new exists + seed file content preserved — stronger than the old "error returned" assertion. ✓
  • Multi-persona parent cleanup: remove_dir refuses non-empty, so <root>/personas/ survives if other personas remain. The tests don't explicitly cover this multi-persona case, but the kernel semantics make it impossible to regress without changing the syscall. Acceptable, worth a follow-up test.

Concerns I cannot fully refute (non-blocking):

  • Symlink-as-empty-destination race: if new is a symlink to an empty dir, read_dir follows the symlink, sees empty, remove_dir(new) removes the SYMLINK (not its target), then rename lands real state where a symlink used to be. Not corruption, but surprising. Substrate-controlled <continuum_root> means an operator would have to deliberately plant the symlink. Document as known edge in a follow-up.
  • TOCTOU between empty-check and rename: a concurrent process could land in the destination between the empty-check and the rename. rename(2) on Linux to a non-empty dir errors ENOTEMPTY; on macOS APFS it errors EEXIST for non-empty. Either way: typed error, not corruption. Acceptable for boot path with no concurrent persona init.

Verdict: APPROVE.

The PR replaces a hostile operator UX (refuse-with-mv-command) with substrate-handled migration that maintains the [[no-fallbacks-ever]] discipline at the genuinely-ambiguous case (both paths populated). Error messages are operator-actionable. Tests pin the correct property surface. Stacks cleanly on #1550's boot_status seam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant