Skip to content

feat(boot): adapter selection reports via boot_status (e9f50a36 A4)#1553

Open
joelteply wants to merge 2 commits into
feat/airc-boot-linefrom
feat/adapter-boot-line
Open

feat(boot): adapter selection reports via boot_status (e9f50a36 A4)#1553
joelteply wants to merge 2 commits into
feat/airc-boot-linefrom
feat/adapter-boot-line

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Summary

Card e9f50a36 slice A4. Stacked on #1552#1551#1550.

After AIProviderModule::register_adapters walks every cloud API key + every local llamacpp-* GGUF, it now emits a single boot_status("adapter", ...) line so the operator sees what inference capabilities the substrate actually has — not what should be available given the keys / models it thinks it configured.

What an operator sees

# Empty: substrate has zero inference path
[continuum-core-server] adapter: ✗ no inference adapter registered — add API keys to ~/.continuum/config.env or pull a local GGUF

# Cloud only: substrate works but one network blip kills inference
[continuum-core-server] adapter: ⚠ cloud only: anthropic, openai, groq (no local fallback if cloud unreachable)

# At least one local: offline path exists, mixed or local-only both Ok
[continuum-core-server] adapter: ✓ anthropic, llamacpp-qwen3-coder, openai (3 providers)

Why Degraded for cloud-only

Today's substrate works "fine" in cloud-only mode — until it doesn't. A single DNS hiccup, an expired API key, or a provider outage takes down inference for every persona at once. The substrate KNOWS this is a fragile state at boot time (it walked the registry, it sees no llamacpp-* provider). Surfacing it as instead of is the kind of honest reporting the e9f50a36 slice is for — the operator can decide whether to pull a GGUF or accept the risk, but they shouldn't have to guess the state from secondary symptoms two weeks later when the WAN goes down.

What changed

  • crates/continuum-core/src/modules/ai_provider.rs:
    • New pure helper render_adapter_boot_status(available: &[&str]) -> (BootStatusKind, String). Empty → Failed. No llamacpp-* → Degraded. At least one local → Ok. The string "llamacpp-" prefix is the marker — matches register_adapters's LLAMACPP_PROVIDER_ID-driven registration loop.
    • Single boot_status("adapter", kind, detail) call at the end of register_adapters, right after the existing "AIProviderModule initialized with N providers" log.
  • 4 unit tests pin the rendering for each case + the operator-facing detail strings.

Test plan

  • cargo test -p continuum-core --lib modules::ai_provider::boot_status_tests — 4/4 green.
  • cargo build -p continuum-core clean.

Composition

Depends on #1552#1551#1550. Merge order: #1550#1551#1552#1553.

After this slice the operator's boot view is:

[continuum-core-server] probes: ✓ landing at /tmp/probes.jsonl
[continuum-core-server] logs: ✓ /Users/joel/.continuum/logs/...log (rolling daily, retention 7)
[continuum-core-server] boot-mode: ✓ full-citizen (...)
[continuum-core-server] airc: ✓ socket=/Users/joel/.airc/runtime/airc.sock peer=12345678 room=general
[continuum-core-server] adapter: ✓ anthropic, llamacpp-qwen3-coder, openai (3 providers)
[continuum-core-server] persona-home: ⚠ migrated Paige: …   ← only after legacy layout detected

Last remaining e9f50a36 slice: model (GGUF availability with exact download URL on miss).

Card e9f50a36 slice A4. Stacked on #1552#1551#1550.

After AIProviderModule's `register_adapters` walks every cloud key
+ every local llamacpp-* GGUF, it now emits a single
`boot_status("adapter", ...)` line so the operator sees what
inference capabilities the substrate ACTUALLY has — not what
*should* be available given the keys / models it thinks it
configured.

Mapping (pure `render_adapter_boot_status` for testability):

- **Empty registry** → ✗ Failed
  detail: "no inference adapter registered — add API keys to
  ~/.continuum/config.env or pull a local GGUF"

- **All cloud, no `llamacpp-*`** → ⚠ Degraded
  detail: "cloud only: anthropic, openai (no local fallback if
  cloud unreachable)"

  The substrate technically works, but every persona is one
  network blip away from total cognitive silence. Operator must
  see this — silent cloud-only is the worst kind of cognitive
  unreliability.

- **At least one `llamacpp-*`** → ✓ Ok
  detail: "anthropic, llamacpp-qwen3-coder, openai (3 providers)"

  Mixed cloud + local OR local-only both land here; substrate
  has at least one offline path. The detail names the providers
  so an operator can spot a wrong-model-loaded at a glance.

The kind ordering composes with the rest of the slice A boot
lines so a sentinel computing "worst kind across boot" with
`.max()` reports the right verdict regardless of which subsystem
is the weakest link.

4 unit tests pin each rendering case + the operator-facing detail
strings — adapter set as &[&str] (matches AdapterRegistry::available)
so the test doesn't need a registry fixture.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@joelteply joelteply left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERDICT: REQUEST_CHANGES

Adversarial review — PR #1553 (adapter boot_status)

Default posture: BLOCK. Investigated against substrate ground truth.

7 dimensions

  1. CorrectnessAdapterRegistry::available() returns Vec<&str> of registration_key(provider_id) values (adapter.rs:676). The llama.cpp adapter's provider_id is the fixed const LLAMACPP_PROVIDER_ID = "llamacpp-local" (inference/llamacpp_adapter.rs:55,659) — singular, not per-model. Prefix llamacpp- matches it for the real case, but the tests on lines 1231/1250 use "llamacpp-qwen3-coder", a string that never appears in production. Tests document a fiction; coincidentally the prefix still matches. The || *name == "llamacpp-local" branch in the helper is dead code (the prefix already matches).
  2. Architecturestrong argue: the "is this an offline path" check belongs on the adapter trait (fn is_offline_path(&self) -> bool defaulting to false; LlamaCppAdapter and DmrAdapter both override to true). String-prefix matching on a registry id couples the boot-status helper to a naming convention that nothing else enforces. Add a forge backend tomorrow named metal-direct and the boot line silently lies. Trait method = compile-time guarantee.
  3. Traits/API&[&str] is the registry's natural shape. Fine as-is, if the predicate were trait-driven (#2).
  4. Modularity — pure helper + tests is correct shape. If the "do we have an offline path" question needs to be asked elsewhere (cognition's FailFast mode, the DMR watchdog), it should live next to the registry, not in ai_provider.rs.
  5. Speed — boot path. N/A.
  6. Intel-Mac viability — platform-agnostic. ✓
  7. Elegance — Degraded-for-cloud-only is defensible per [[no-fallbacks-ever]] doctrine. ✓

Specific concerns — verified

  • DMR misclassification (real bug): DMR_PROVIDER_ID = "docker-model-runner" (line 43). DMR runs on localhost:12434 — host-local, network-independent. The helper treats it as cloud. A host with only DMR registered (model_registry not initialized, Docker Desktop up) will boot with Degraded: "cloud only: docker-model-runner (no local fallback if cloud unreachable)". That's factually false — DMR survives WAN outage. The boot line lies on a real configuration. No test covers this edge case (prompt called this out as a reject criterion).
  • Hardcoded prefix drift: prefix "llamacpp-" is hardcoded twice in the helper, not pulled from LLAMACPP_PROVIDER_ID. If the const ever changes (e.g. llama-local), the boot line silently flips to Degraded everywhere. Use name == LLAMACPP_PROVIDER_ID or the trait approach.
  • Test fiction: tests use "llamacpp-qwen3-coder". Real registry only ever holds "llamacpp-local" (one entry, multiplexing models internally via models_for_provider). Tests pass but document a non-existent shape.
  • Ordering claim verified: Ok < Degraded < Failed (boot_status.rs:80). .max() picks worst. ✓

Verdict

REQUEST_CHANGES. DMR-as-cloud is a doctrinal lie on a real config and there's no test for it. Either:
(a) extend the prefix check to include DMR_PROVIDER_ID (cheap, ugly), or
(b) move the offline-path predicate onto the adapter trait (clean, drift-proof — preferred).

Also: replace test fixtures with LLAMACPP_PROVIDER_ID to match reality; drop dead || *name == "llamacpp-local" branch; add a "DMR-only" test case.

— adversarial reviewer (Opus 4.7)

…offline (reviewer #1)

PR #1553 round 1 reviewer caught two real defects:

1. **DMR misclassification.** `docker-model-runner` runs against
   `localhost:12434` via Docker Desktop's model runner socket —
   network-independent. The prior prefix match `llamacpp-` didn't
   include it, so a DMR-only host got `Degraded: cloud only ...`,
   a load-bearing observability lie.

2. **Test fiction.** Prior tests used `"llamacpp-qwen3-coder"` —
   a string production never emits. `register_adapters` uses the
   single `LLAMACPP_PROVIDER_ID = "llamacpp-local"` constant
   and multiplexes models within the adapter. Tests passed by
   coincidence (prefix match) while pinning a non-existent shape.

Replaces the prefix predicate with an `OFFLINE_PROVIDER_IDS` set
sourced from the actual constants. New `dmr_only_reports_ok_not_
cloud_only` test pins the DMR case directly. Existing tests now
reference `crate::inference::LLAMACPP_PROVIDER_ID` instead of
fictitious model ids — a future rename fails compilation rather
than silently changing the boot-status classification.

Architecture follow-up noted: the `is_offline_path` predicate
ultimately belongs on the `AIProviderAdapter` trait so the
substrate gets compile-time exhaustiveness against drift. Deferred
because the registry's `available()` returns `Vec<&str>` and
the trait isn't reachable without holding the lock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant