Skip to content

fix(probes): isolate global-subscriber tests — kill #203 pollution flake#1549

Open
joelteply wants to merge 1 commit into
canaryfrom
fix/uri-layer-test-pollution
Open

fix(probes): isolate global-subscriber tests — kill #203 pollution flake#1549
joelteply wants to merge 1 commit into
canaryfrom
fix/uri-layer-test-pollution

Conversation

@joelteply

Copy link
Copy Markdown
Contributor

Summary

Close task #203: move install_probe_tracing tests out of the lib-test binary into their own integration-test binary so the global tracing-subscriber install they trigger can't pollute other lib-tests.

Why

install_probe_tracing calls tracing_subscriber::registry().try_init(), which permanently attaches the substrate's UriCaptureLayer + ProbeRouterLayer + fmt layer stack to the global tracing default for the rest of the process. Correct for production (one process, one boot, one install) — wrong for cargo's shared lib-test binary, where the install leaked across tests:

  • routing::tracing_init::tests::install_is_idempotent_with_no_disk_capture → installs the global subscriber.
  • routing::uri_layer::tests::no_subscriber_returns_empty_chain → asserts an empty chain because "no installed Layer means no captured frames" — panics when the install ran first in the same binary.

Task #203 captured the symptom. PR #1547 hit it deterministically today when running cargo test -p continuum-core --lib routing::.

What changed

  • Created continuum-core/tests/tracing_install_global.rs (new integration-test file = its own cargo test binary = its own process).
  • Moved install_is_idempotent_with_no_disk_capture and install_surfaces_open_failed_for_unwritable_path there verbatim.
  • Replaced their inline removal with a comment in tracing_init.rs::tests explaining where they live + why.
  • from_env_reads_documented_env_vars stays in the lib-test module — it only mutates env vars (already serial within itself), never calls install_probe_tracing.

Test plan

  • cargo test --features metal,accelerate -p continuum-core --lib routing::252/252 passed (was 252/253 with the flake — no_subscriber_returns_empty_chain is now consistently green).
  • cargo test --features metal,accelerate -p continuum-core --test tracing_install_global2/2 passed in the isolated binary.

Closes #203.

@joelteply

Copy link
Copy Markdown
Contributor Author

APPROVE (adversarial reviewer) — fix is sound, minimal, and empirically verified

Per [[agent-review-as-acceptable-approval]]: GitHub blocks self-approval, so this verdict lands as a comment with explicit APPROVE.

Walked the 7 dimensions:

  1. Correctness — Cargo's test runner builds each tests/*.rs as a separate binary, so process isolation is real. Ran both halves locally with --features metal,accelerate on the branch:

    • cargo test --test tracing_install_global2 passed, 0 failed
    • cargo test --lib routing::252 passed, 0 failed (was 253-with-1-flake on canary; the two install tests live in their own binary now). The previously-flaky routing::uri_layer::tests::no_subscriber_returns_empty_chain passes deterministically.
    • The two moved tests are semantically identical to canary (verbatim bodies modulo doc-comment polish).
  2. Architecture — A Dispatch-returning refactor of install_probe_tracing would be cleaner in the abstract, but the function IS the global-install API by design (it's what main.rs boots). Inflating its surface for one test concern would be the wrong trade. Integration-test-binary isolation matches what the rest of continuum-core/tests/ already does and is the right scope.

  3. Traits/APIrouting::tracing_init and routing::probe_file_sink are pub mod; ProbeTracingConfig, install_probe_tracing, ProbeFileSinkError::OpenFailed all reachable from the integration crate. Imports compile and run.

  4. Modularity — The comment block left in the lib-test module pointing readers to tests/tracing_install_global.rs and explaining the global-try_init reason is load-bearing — without it the next person diagnosing Implementation Plan for ai/route/simulate Command (Related to #202) #203 would re-add the tests. Keep.

  5. Speed — One extra integration-test binary link. Acceptable cost for deterministic isolation; the alternative is the flake re-firing in CI under different scheduling.

  6. Intel-Mac viability — N/A; test infrastructure.

  7. Elegance — Minimal viable change. Three-field ProbeTracingConfig { probe_file, probe_classes, default_filter } shape matches canary exactly. No accidental third test removed — from_env_reads_documented_env_vars correctly stays in lib-tests (it doesn't touch the global subscriber).

Merge-order note (not blocking): If PR #1547 lands first, it adds a log_dir field to ProbeTracingConfig and removes derive(Debug) from ProbeInstall — this PR will need a trivial rebase (add log_dir: None, drop the {:?} formatter on result). Worth a glance before the merger fires; not worth blocking on.

Per [[agent-review-as-acceptable-approval]] + [[jtag-probes-are-rtos-debugger]] — flake on the probe substrate's own boot path is a substrate-credibility bug; this fix lands the deterministic cure. APPROVE.

…lake

`install_probe_tracing` calls `tracing_subscriber::registry().try_init()`,
which permanently attaches the substrate's UriCaptureLayer +
ProbeRouterLayer + fmt layer to the GLOBAL tracing default for the
rest of the process. That's correct for production (one process, one
boot, one install) but in cargo's lib-test binary it leaked across
tests:

  routing::tracing_init::tests::install_is_idempotent_with_no_disk_capture
    → installs the global subscriber

  routing::uri_layer::tests::no_subscriber_returns_empty_chain
    → asserts an empty chain "no installed Layer means no captured frames"
    → PANICS when the install ran first in the same binary

Task #203 captured the symptom. Today's PR #1547 hit it deterministically
when running `cargo test -p continuum-core --lib routing::`.

Fix: move both `install_*` tests out of the lib-test binary and into
their own integration-test file (`tests/tracing_install_global.rs`).
Each `tests/*.rs` file becomes a separate binary in cargo's runner —
process isolation contains the global install per binary.

`from_env_reads_documented_env_vars` stays in the lib-test module
because it only touches env vars (already serial within itself) and
never calls `install_probe_tracing`.

Verification: `cargo test -p continuum-core --lib routing::` now
passes 252/252 (was 252/253 with the flake), and the new integration
binary passes 2/2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant