Skip to content

hardening: clarify host phase authority and execution seams#79

Merged
sf19-97 merged 1 commit into
mainfrom
codex/host-phase-authority-hardening
Apr 18, 2026
Merged

hardening: clarify host phase authority and execution seams#79
sf19-97 merged 1 commit into
mainfrom
codex/host-phase-authority-hardening

Conversation

@sf19-97
Copy link
Copy Markdown
Collaborator

@sf19-97 sf19-97 commented Apr 18, 2026

Summary

Clarifies ownership boundaries and phase authority across the host crate's execution pipeline.

Changes

  • Separate warning emission from validation via new diagnostics.rs module — validate_hosted_runner_configuration() is now pure, warning emission uses a writer-based seam
  • Move live-run setup DTO authority into live_prep.rs — ValidatedLiveRunnerSetup and PreparedLiveRunnerSetup owned where they are constructed, fields private with into_parts() accessor
  • Introduce ValidatedDriverInput for explicit driver-input phase boundary between preparation and execution
  • Centralize process-driver host-stop ownership via explicit check_host_stop(...)
  • Replace generic capture-finalization transitions with domain-shaped methods — on_step_success(), on_dispatch_failure(), on_fatal_error() replace open transition(to)
  • Tighten import authority and phase boundaries across host usecases

Verification

  • cargo fmt --check ✅
  • cargo test -p ergo-host: 208 passed, 0 failed ✅
  • bash tools/verify_layer_boundaries.sh ✅

Closes

Resolves #69, #72, #76, #77, #78

- separate warning emission from validation via diagnostics.rs
- move live-run setup DTO authority into live_prep.rs
- introduce ValidatedDriverInput for explicit driver-input phase
- clarify process-driver host-stop and terminal failure ownership
- replace generic capture-finalization transitions with domain-shaped methods
- tighten import authority and phase boundaries across host usecases
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 18, 2026

🤖 Augment PR Summary

Summary: This PR hardens the ergo-host execution pipeline by clarifying phase ownership boundaries (prep vs. run vs. runner internals) and making warning/diagnostic emission explicit.

Changes:

  • Introduces diagnostics.rs to centralize host-owned warning formatting/emission with a writer-based seam.
  • Makes validate_hosted_runner_configuration() pure by returning egress validation warnings as data (instead of emitting directly).
  • Updates HostedRunner::new() to preserve existing behavior by emitting warnings to stderr after validation.
  • Moves live-run setup DTO authority into live_prep.rs and tightens field visibility behind accessors.
  • Adds ValidatedDriverInput so driver input is validated/prepared once and then consumed by execution.
  • Centralizes process-driver host-stop handling via a single check function used at all loop call sites.
  • Replaces generic capture-finalization transitions with domain-shaped methods on CaptureFinalizationState, with contract tests.
  • Tightens import authority by introducing a shared prelude for usecase siblings and making process-driver imports explicit.

Technical Notes: Behavior is preserved for direct runner construction while orchestration paths now own when/how warnings are emitted.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Ok(warnings)
}

pub struct HostedRunner {
Copy link
Copy Markdown

@augmentcode augmentcode Bot Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/prod/core/host/src/runner.rs:367: Now that validate_hosted_runner_configuration() returns warnings only on Ok, warnings found before a later validation error (e.g. MissingEgressProvenance) will be dropped and never emitted. Previously those warnings would have been printed before the function returned an error—was that behavior change intentional?

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

/// `emit_warnings_to_stderr`.
pub(crate) fn emit_warnings<W: Write>(writer: &mut W, warnings: &[impl Display]) {
for warning in warnings {
let _ = writeln!(writer, "warning: {warning}");
Copy link
Copy Markdown

@augmentcode augmentcode Bot Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/prod/core/host/src/diagnostics.rs:32: Swallowing the writeln! result means a failing writer can silently drop warnings, which could make diagnostics unreliable if a future sink errors. Do we want warning emission to be explicitly best-effort everywhere, or should write failures be observable by the caller?

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

This PR is a well-scoped hardening refactor of the host crate's execution pipeline, focusing on ownership boundaries, phase authority, and testability. It introduces a diagnostics.rs module to decouple warning emission from validation (making both testable independently), moves PreparedLiveRunnerSetup/ValidatedLiveRunnerSetup into live_prep.rs with private fields behind accessor methods, introduces a ValidatedDriverInput enum as an explicit phase boundary between preparation and execution (eliminating double fixture parsing), centralizes the host-stop predicate/dispatch in process_driver.rs via a unified check_host_stop function, and replaces generic transition(to) calls with domain-shaped on_step_success(), on_dispatch_failure(), on_fatal_error() methods on CaptureFinalizationState. All 208 existing tests pass.

  • diagnostics.rs: New module separating warning emission (emit_warnings<W: Write>) from stderr production, with tests for format and empty-slice behavior
  • runner.rs: validate_hosted_runner_configuration now returns Vec<EgressValidationWarning> (pure); domain transition methods added to CaptureFinalizationState with comprehensive test coverage
  • live_prep.rs: Now owns PreparedLiveRunnerSetup and ValidatedLiveRunnerSetup with private fields; explicit imports replacing use super::*
  • live_run.rs: ValidatedDriverInput eliminates double fixture preparation; run_prepared_fixture_driver consumes pre-prepared input; explicit imports
  • process_driver.rs: check_host_stop unified across all 6 call sites with cleaner ControlFlow<Result<…>, …> signature; into_terminal_failure helper eliminates repeated commit-phase match arms
  • shared.rs: Narrowed to multi-consumer external/stdlib imports; process-driver-specific imports moved to process_driver.rs directly

Confidence Score: 5/5

Safe to merge; all 208 tests pass and no correctness or logic issues found — only two non-blocking P2 style observations.

The refactoring is well-scoped, thoroughly tested, and the design decisions are clearly documented. Both findings are P2 style issues: redundant command validation in run_process_driver (harmless, idempotent, immutable command) and asymmetric host-stop function signatures between the fixture and process drivers (both semantically correct). No logic, security, or data-integrity concerns were found. The PR resolves five tracked issues and the verification checklist (fmt, tests, layer boundaries) is complete.

crates/prod/core/host/src/usecases/process_driver.rs — minor double-validation pattern; crates/prod/core/host/src/usecases/live_run.rs — host-stop signature asymmetry.

Important Files Changed

Filename Overview
crates/prod/core/host/src/diagnostics.rs New module cleanly separates warning emission from validation; writer-based seam makes it testable; two well-targeted tests cover format contract and empty-slice case.
crates/prod/core/host/src/runner.rs validate_hosted_runner_configuration correctly made pure (returns warnings as data); CaptureFinalizationState domain methods added with comprehensive transition tests; emit_egress_validation_warnings_to_stderr removed in favour of diagnostics.rs.
crates/prod/core/host/src/runner/tests.rs 163 new lines of tests covering warning-as-data contract, HostedRunner construction with warnings, and all CaptureFinalizationState domain transitions including idempotent and terminal-preserving cases.
crates/prod/core/host/src/usecases/live_prep.rs PreparedLiveRunnerSetup/ValidatedLiveRunnerSetup moved here from usecases.rs; private fields with accessor methods (adapter_bound(), dependency_summary(), into_runner()); explicit imports replace use super::*; warnings now emitted at this layer for orchestration path.
crates/prod/core/host/src/usecases/live_run.rs ValidatedDriverInput enum introduces explicit phase boundary; validate_driver_input now returns prepared fixture state for single-pass execution; run_prepared_fixture_driver replaces run_fixture_driver/run_fixture_items_driver; validate_fixture_items_input helper removed; explicit imports.
crates/prod/core/host/src/usecases/process_driver.rs check_host_stop unifies 6 call sites with cleaner ControlFlow signature; into_terminal_failure eliminates repeated commit-phase match arms; V0_SINGLE_EPISODE_LABEL constant extracted; process driver still re-validates command internally after validate_driver_input already validated it.
crates/prod/core/host/src/usecases/shared.rs Narrowed to genuine multi-consumer imports; process-driver-only imports (BufRead, BufReader, mpsc, thread, process types, serde) removed; doc comment updated to reflect narrowed scope.
crates/prod/core/host/src/usecases/tests/mod.rs Minor: adds explicit use std::process::Command import now that shared.rs no longer re-exports process types.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant live_prep
    participant runner
    participant diagnostics
    participant live_run
    participant process_driver

    Note over Caller,process_driver: Preparation phase (live_prep.rs)
    Caller->>live_prep: prepare_live_runner_setup_from_assets()
    live_prep->>runner: validate_hosted_runner_configuration()
    runner-->>live_prep: Ok(Vec of EgressValidationWarning)
    live_prep->>diagnostics: emit_warnings_to_stderr(warnings)
    live_prep-->>Caller: Ok(PreparedLiveRunnerSetup)

    Note over Caller,process_driver: Execution phase (live_run.rs)
    Caller->>live_run: execute_run_graph_with_policy()
    live_run->>live_run: validate_driver_input() returns ValidatedDriverInput
    live_run->>runner: start_egress_channels()

    alt Fixture driver
        live_run->>live_run: run_prepared_fixture_driver(PreparedFixtureInput)
        loop each event
            live_run->>live_run: maybe_fixture_host_stop()
            live_run->>runner: step(event)
            runner-->>live_run: on_step_success / on_dispatch_failure / on_fatal_error
        end
    else Process driver
        live_run->>process_driver: run_process_driver(command)
        loop protocol loop
            process_driver->>process_driver: check_host_stop()
            process_driver->>runner: step(event)
            runner-->>process_driver: domain transition method
        end
    end

    Note over Caller,process_driver: Finalization phase (live_prep.rs)
    live_run->>live_prep: finalize_hosted_runner_capture_with_stage()
    live_prep->>runner: ensure_capture_finalizable()
    live_prep->>runner: ensure_no_pending_egress_acks()
    live_prep->>runner: stop_egress_channels()
    live_prep->>runner: into_capture_bundle()
    live_prep-->>live_run: Ok(CaptureBundle)
Loading

Comments Outside Diff (2)

  1. crates/prod/core/host/src/usecases/process_driver.rs, line 392-399 (link)

    P2 Redundant command validation after validate_driver_input

    run_process_driver calls validate_process_driver_command again at entry (lines 398–399), but the caller — execute_run_graph_with_policy — already validated the command through validate_driver_input before reaching this call site. The PR comment on validate_driver_input explicitly states "Fixture items are parsed and prepared once here; the run path no longer needs to repeat that work," but the analogous principle wasn't applied to process-driver command validation.

    The double validation is harmless (the command is immutable between the two calls), but it's inconsistent with the stated design goal. Since run_process_driver is pub(super) it can be called from tests directly with an unvalidated command, so removing the internal guard here would require callers to uphold the invariant themselves. One option is to keep internal validation but document the redundancy:

    pub(super) fn run_process_driver(
        command: Vec<String>,
        runner: HostedRunner,
        process_policy: ProcessDriverPolicy,
        lifecycle: &RunLifecycleState,
    ) -> Result<DriverExecution, HostRunError> {
        // Callers via execute_run_graph_with_policy have already validated through
        // validate_driver_input; this re-check is a defensive guard for direct callers
        // (e.g. tests) that invoke run_process_driver without going through the
        // validated-input path.
        validate_process_driver_command(&command)
            .map_err(|err| HostRunError::Driver(HostDriverError::Input(err)))?;

    Fix in Claude Code Fix in Codex

  2. crates/prod/core/host/src/usecases/live_run.rs, line 338-347 (link)

    P2 maybe_fixture_host_stop retains the old Result<ControlFlow<…>, …> signature

    The process driver's check_host_stop was updated to ControlFlow<Result<DriverExecution, HostRunError>, ProcessDriverLoopState> — which the module doc describes as the unified host-stop authority pattern. The fixture driver's maybe_fixture_host_stop still uses the inverted Result<ControlFlow<DriverExecution, FixtureDriverState>, HostRunError> signature, leading to asymmetric call-site idioms:

    // fixture — old-style: uses ? to propagate, explicit Ok(...) on Break
    state = match maybe_fixture_host_stop(state, lifecycle)? {
        ControlFlow::Break(execution) => return Ok(execution),
        ControlFlow::Continue(state) => state,
    };
    
    // process — new-style: Result inside Break, no outer ?
    state = match check_host_stop(state, &mut child, &mut stderr_handle, lifecycle) {
        ControlFlow::Continue(s) => s,
        ControlFlow::Break(result) => return result,
    };

    Both are correct, but readers switching between the two drivers must mentally translate two different conventions. Consider aligning maybe_fixture_host_stop to the newer pattern in a follow-up — or add a note to the module doc acknowledging the intentional asymmetry if keeping it is preferred.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Fix in Claude Code Fix in Codex

Fix All in Claude Code Fix All in Codex

Reviews (1): Last reviewed commit: "hardening: clarify host phase authority ..." | Re-trigger Greptile

@sf19-97 sf19-97 merged commit 7784f46 into main Apr 18, 2026
3 checks passed
@sf19-97 sf19-97 deleted the codex/host-phase-authority-hardening branch April 18, 2026 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

host: introduce explicit process-driver lifecycle state machine

1 participant