Skip to content

feat(runtime): boot_status — honest one-line-per-subsystem startup contract (e9f50a36)#1550

Open
joelteply wants to merge 2 commits into
canaryfrom
feat/boot-status-honest-startup
Open

feat(runtime): boot_status — honest one-line-per-subsystem startup contract (e9f50a36)#1550
joelteply wants to merge 2 commits into
canaryfrom
feat/boot-status-honest-startup

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Summary

Card e9f50a36 ("Slice A — reliable startup, substrate refuses to lie"). First slice: establish the SHAPE so each load-bearing subsystem can report itself at boot in a canonical, grep-friendly form. Future subsystem PRs (airc daemon, adapter selection, model availability, persona home) plug into the same seam without re-litigating the format.

The contract

[continuum-core-server] <subsystem>: <icon> <detail>
  • icon = / / for Ok / Degraded / Failed.
  • subsystem = kebab-case ident matching URI / module name.
  • detail = one operator-actionable line (path, version, count, exact remediation command).

Lines go to stderr unconditionally so they survive RUST_LOG=warn quieting. They ALSO fire tracing::info! with target = "boot.status" and fields subsystem / kind / detail, so the substrate's JsonlProbeFileSink captures the structured record when CONTINUUM_PROBE_CLASSES includes boot.status. Same call, two consumers — the human-facing console line and the structured probe sink, per [[observability-is-half-the-architecture]].

What this PR ships

  1. runtime/boot_status.rs — new module:

    • boot_status(subsystem, kind, detail) function
    • BootStatusKind { Ok, Degraded, Failed } enum (total-ordered so sentinels can compute "worst kind across subsystems" with .max())
    • Pure formatter format_boot_status_line for unit testing
    • 6 unit tests pin the format, icons, tags, ordering
  2. Three call sites in main.rs converted to the new contract:

    • probes when CONTINUUM_PROBE_FILE is set, when it isn't (so an operator who thought they had probes wired up sees the off-state immediately).
    • logs when log_dir resolved, when fmt falls back to stderr (containerized envs without HOME).
    • boot-mode with the parsed mode + description. Replaces the previous info!(" Boot mode: ...") line that hid under RUST_LOG.

Live-smoke output

[continuum-core-server] probes: ✓ landing at /tmp/test-probes.jsonl
[continuum-core-server] logs: ✓ /Users/joel/.continuum/logs/continuum-core-server.YYYY-MM-DD.log (rolling daily, retention 7)
[continuum-core-server] boot-mode: ✓ full-citizen (hosts personas via AIRC; requires AIRC Healthy)

What this PR is NOT

Not a fix for any one specific subsystem's silent boot. Persona home migration, airc daemon discovery, adapter selection, model availability — each becomes a follow-up PR that adds its own boot_status call. This PR is the seam, not the audit.

Test plan

  • cargo test -p continuum-core --lib runtime::boot_status — 6/6 green.
  • cargo build -p continuum-core --bin continuum-core-server clean.
  • Live-tested the binary with CONTINUUM_PROBE_FILE=/tmp/probes.jsonl: all three boot lines render correctly with the right icons + detail.

Composition with prior PRs

…ntract

Card e9f50a36 ("Slice A — reliable startup, substrate refuses to lie").
First slice: establish the shape so each load-bearing subsystem can
report itself at boot in a canonical, grep-friendly form. Future
subsystem PRs (airc, adapter, model, persona home, etc) plug into
the same seam without re-litigating the format.

## The contract

```
[continuum-core-server] <subsystem>: <icon> <detail>
```

- icon = ✓ / ⚠ / ✗ for Ok / Degraded / Failed
- subsystem = kebab-case ident matching URI / module name
- detail = one operator-actionable line (path / version / count /
  exact remediation command)

Goes to stderr unconditionally so it survives RUST_LOG=warn quieting
during load tests. ALSO fires `tracing::info!` with
`target = "boot.status"` and fields `subsystem` / `kind` / `detail`,
so the substrate's JsonlProbeFileSink captures the structured record
when `CONTINUUM_PROBE_CLASSES` includes `boot.status`. Same call,
two consumers — the human-facing console line and the structured
probe sink, per [[observability-is-half-the-architecture]].

## What this PR ships

1. `runtime/boot_status.rs` — `boot_status(subsystem, kind, detail)`
   function + `BootStatusKind` enum (Ok / Degraded / Failed, total-
   ordered so sentinels can compute "worst kind across subsystems"
   with `.max()`). Pure formatter `format_boot_status_line` for
   unit testing. 6 unit tests pin the format, icons, tags, ordering.

2. Three call sites in main.rs converted to the new contract:
   - `probes` — Ok when CONTINUUM_PROBE_FILE set, Degraded when
     not (operator who *thought* they had probes wired up sees it).
   - `logs` — Ok when log_dir resolved, Degraded when fmt falls
     back to stderr (containerized envs without HOME).
   - `boot-mode` — Ok with the parsed mode + description. Replaces
     the previous `info!("   Boot mode:  ...")` line that hid under
     RUST_LOG.

Live-smoke-tested:

```
[continuum-core-server] probes: ✓ landing at /tmp/test-probes.jsonl
[continuum-core-server] logs: ✓ /Users/joel/.continuum/logs/continuum-core-server.YYYY-MM-DD.log (rolling daily, retention 7)
[continuum-core-server] boot-mode: ✓ full-citizen (hosts personas via AIRC; requires AIRC Healthy)
```

## What this PR is NOT

Not a fix for any one specific subsystem's silent boot. Persona
home migration, airc daemon discovery, adapter selection, model
availability — each is a follow-up PR that adds its own
boot_status call. This PR is the SEAM, not the audit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@joelteply joelteply left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERDICT: REQUEST_CHANGES

Adversarial review per [[reviewer-mandate-elegance-and-substrate-viability]].

7 dimensions

  1. Correctnesseprintln! bypasses EnvFilter so RUST_LOG=warn doesn't quiet it ✓. Ord derive on BootStatusKind { Ok, Degraded, Failed } gives Ok<Degraded<Failed; .max() returns worst ✓. Format helper matches tests ✓. BUT doc claim "JsonlProbeFileSink captures these under the boot.status class" is FALSE. Probe sink filters on a probe_class FIELD (set by probe!), NOT tracing target. probe_file_sink.rs:264-267: visitor.probe_class.None => return. With target="boot.status" and no probe_class field, JSONL sink early-outs. Rolling-log fmt capture works; probe JSONL does NOT.

  2. Architectureruntime/boot_status.rs paired with boot_mode.rs ✓. Function-not-macro fine at 3 callsites.

  3. Traits/API&str for subsystem correct for "subsystems opt in over time"; enum would centralize a registry (anti-pattern).

  4. Modularity — Pure format_boot_status_line clean. Tests pin every variant + ordering + Display.

5/6. Speed / Intel-Mac — n/a.

  1. Elegance — 227 LoC, doc long but load-bearing. Unicode icons fine (UTF-8, less -R/grep handle them). No double-write: eprintln→console only, tracing→rolling log only.

Required change

Fix the probe-sink claim. Either:

  • (a) Replace tracing::info!(target: "boot.status", ...) with probe!(class = "boot.status", subsystem=..., kind=..., detail=...) so JSONL capture works — substrate-coherent, sentinels on debug/probes/boot.status/stream is the load-bearing case, OR
  • (b) Strike "JsonlProbeFileSink captures these" from the module doc + inline comment; rolling-log is sufficient. Don't promise a no-op pathway.

(a) preferred. Everything else clean — ordering, format pin, stderr-survives-RUST_LOG, no double-write, boot-mode promotion from hidden info! to first-class line is a real correctness win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant