Skip to content

feat(tbtc/signer): FROST/ROAST Rust signer#4005

Open
mswilkison wants to merge 188 commits into
mainfrom
extraction/frost-signer-mirror-2026-05-26
Open

feat(tbtc/signer): FROST/ROAST Rust signer#4005
mswilkison wants to merge 188 commits into
mainfrom
extraction/frost-signer-mirror-2026-05-26

Conversation

@mswilkison

@mswilkison mswilkison commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Lands the Rust FROST/ROAST signer at pkg/tbtc/signer/ alongside the existing Go DKG coordinator. The initial import was extracted from tlabs-xyz/tbtc:feat/frost-schnorr-migration at frozen signed tag frost-extraction-source-v1, then this PR applied targeted canonical-path, CI, and reviewer hardening follow-ups in the canonical keep-core repo.

This is no longer a pure byte-for-byte mirror PR. The post-freeze changes are intentional canonical keep-core changes (canonical-path normalization, CI wiring, and reviewer hardening). After this PR lands, threshold-network/keep-core is the source of truth for the signer; the extracted monorepo tag is provenance for the initial import only.

Source-PR Provenance

The monorepo provenance above is used only to audit the initial import. Ongoing signer development, fixes, CI, and reviews happen in threshold-network/keep-core.

Why This Lives In keep-core

Per extraction plan v38 section 3.1, the signer co-locates with keep-core because:

  • HSM enforcement is external to signer code via standard PKCS#11/KMIP and does not force a separate audit lifecycle.
  • Coordinator coupling dominates; the B-2 Go DKG coordinator already lives in this repo via PR FROST/ROAST readiness branch #3866.
  • TEE adoption later is reversible without forcing a repo split.

Layout Introduced

pkg/tbtc/signer/
├── Cargo.toml + Cargo.lock + build.sh
├── src/
├── tests/
├── benches/
├── include/
├── docs/
│   └── formal/models/
├── scripts/formal/
└── test/vectors/

CI Coverage

This PR now runs:

  • cargo fmt --manifest-path pkg/tbtc/signer/Cargo.toml -- --check
  • cargo clippy --manifest-path pkg/tbtc/signer/Cargo.toml --all-targets -- -D warnings
  • cargo test --manifest-path pkg/tbtc/signer/Cargo.toml
  • cargo test --manifest-path pkg/tbtc/signer/Cargo.toml formal_verification_
  • pkg/tbtc/signer/scripts/formal/run_tla_models.sh

Verification

Latest local verification after review fixes:

  • cargo fmt --manifest-path pkg/tbtc/signer/Cargo.toml -- --check
  • cargo clippy --manifest-path pkg/tbtc/signer/Cargo.toml --all-targets -- -D warnings
  • TBTC_SIGNER_STATE_PATH=/tmp/tbtc-signer-ci-state-local-4005.json cargo test --manifest-path pkg/tbtc/signer/Cargo.toml
  • cargo test --manifest-path pkg/tbtc/signer/Cargo.toml formal_verification_

Latest GitHub CI on this branch is green, including Signer Rust checks, Signer formal invariants, and TLA model checks.

Relationship To PR #3866

PR #3866 lands the Go-side DKG coordinator. This PR lands the Rust-side signer. They are complementary protocol implementations:

  • DKG: keep-core Go side coordinates; signer participates; DKG result digest is computed on both sides and checked for parity.
  • Signing: signer produces FROST signature shares; coordinator aggregates.

PR #3866 and this PR can land independently in either order.

Plan Context

This is one of the mergeable mirror PRs comprising the FROST extraction, alongside tbtc-v2, tbtc-subgraph, and tbtc-v3-indexer mirror PRs.

Lands the Rust signer at pkg/tbtc/signer/ alongside the existing Go DKG
coordinator. Mirrors the signer slice of tlabs-xyz/tbtc:feat/frost-schnorr-migration
(PR #10) at frozen tag frost-extraction-source-v1.

Per extraction plan v38 §3.1, the signer co-locates with keep-core
because: (a) HSM enforcement is external to signer code (standard
PKCS#11/KMIP client), doesn't force a separate audit lifecycle;
(b) coordinator coupling dominates (B-2 DKG coordinator already lives
in this repo via PR #3866); (c) TEE adoption later is reversible
without forcing a repo split.

Layout
- pkg/tbtc/signer/ — Rust crate with own Cargo.toml
- pkg/tbtc/signer/docs/ — signer + ROAST + TEE specs
- pkg/tbtc/signer/docs/formal/models/ — ROAST + TEE TLA+ models
- pkg/tbtc/signer/scripts/formal/ — ROAST vector + TLA runner
- pkg/tbtc/signer/test/vectors/ — roast-attempt-context-v1.json

Files (49 total)
- 47 mirror status (Rust source, signer docs, ROAST docs, TLA models,
  test vector, etc.)
- 2 allowlisted-divergence:
  - pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs
    (path normalization to signer-repo paths)
  - pkg/tbtc/signer/scripts/formal/run_tla_models.sh (MODELS_PATH env
    var refactor; default pkg/tbtc/signer/docs/formal/models/)

Provenance
- Source repository: tlabs-xyz/tbtc
- Source branch: feat/frost-schnorr-migration
- Source tag (frozen): frost-extraction-source-v1
- Source commit (H): 52389bd5cccb5daeef195671feb7ca46be6e2f37
- Source manifest: extraction/frost-extraction-source-manifest.json
  (manifestSha256: f7295fb738104501eb6c0c2447a42122ceb5f684c7a7c5dfb50ecb0bde3a0ea0)
- Source PR(s): #425 (tbtc-signer error codes) + the full FROST migration
  series; complete list per source manifest's commit range over
  tools/tbtc-signer/ and signer-adjacent docs/scripts.

Build wiring (follow-up)
This PR lands the signer source code, docs, vectors, and scripts. Rust
toolchain CI integration (cargo build/test/clippy/fmt as a separate CI
job alongside the existing Go jobs) is a follow-up — Go workspace
ignores non-Go subdirectories automatically; the cargo crate at
pkg/tbtc/signer/Cargo.toml is independently buildable.

Known TBD (resolved pre-merge)
- 2 allowlisted-divergence files have expectedTargetSha256 = <TBD> in
  the source manifest. Path normalization transformations need to be
  applied to make the scripts run in pkg/tbtc/signer/ context; this PR
  ships the source verbatim, follow-up commits on this branch apply
  the transformations before merge.
- Dual signoff required on each allowlisted-divergence entry per plan
  v38 §4.2 (extraction lead + canonical repo maintainer).

Verification (pre-merge per plan v38 §7.2)
- 47 mirror files: sha256 equality between
  git show frost-extraction-source-v1:<sourcePath> and this PR's content
  at pkg/tbtc/signer/<targetPath>.
- 2 allowlisted-divergence files: sha256 == expectedTargetSha256
  (recorded post-transformation in source manifest).
- PR-scoped rogue-file check: git diff --name-only main...HEAD only
  contains files in manifest's fileMap with targetKey "signer".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces pkg/tbtc/signer, a Rust tbtc-signer bootstrap crate with a C ABI JSON-over-buffer interface, FFI translation helpers, API types, error semantics, deterministic coordinator selection, a comprehensive lib.rs FFI surface with tests, an admission-checker CLI, Criterion benchmarks, TLA+ formal models, conformance vectors/checkers, and extensive design/runbook documentation.

Changes

Core Signer Implementation

Layer / File(s) Summary
FFI & API contract definition
include/frost_tbtc.h, src/api.rs
C header declares TbtcBuffer and TbtcSignerResult structs plus ~18 frost_tbtc_* function signatures; Rust API types define serde-serialized request/response payloads for DKG, sign-round lifecycle, transcript audit, quarantine, refresh, emergency rekey, differential fuzzing, and canary rollout.
Error classification and recovery semantics
src/errors.rs
EngineError enum with thiserror integration; code() maps variants to stable string identifiers; recovery_class() classifies errors as "recoverable" or "terminal" with explicit handling for consumed attempt/round replays.
FFI translation layer
src/ffi.rs
C-compatible buffer structs, JSON serialization/deserialization helpers, ffi_entry for panic containment via catch_unwind, and safe free_buffer memory management for the FFI boundary.
Deterministic coordinator selection
src/go_math_rand.rs
Rust port of Go's math/rand with fixed cooked constants table, seedrand, bounded sampling, and shuffle; select_coordinator_identifier returns deterministic coordinators matching keep-core outputs per test vectors.
Lib.rs FFI endpoints & bootstrap gating
src/lib.rs
Module declarations, signer version, bootstrap-mode environment/profile parsing with strict production gating, test-only override, ~18 #[no_mangle] extern "C" endpoints (DKG, sign-round, transcript audit, emergency rekey, canary rollout, refresh-shares), and comprehensive test suite covering idempotency, lifecycle transitions, Taproot validation, and emergency/rollout semantics.

Admission Checking & Policy Enforcement

Layer / File(s) Summary
Admission checker binary
src/bin/admission_checker.rs
JSON-configured operator admission evaluator with policy constraints (per-provider/region caps, custody/attestation/SLA requirements), optional DAO override verification (Schnorr signature validation over SHA-256 payload digest), and replay registry persistence to prevent re-use of override decisions.

Benchmarking & Formal Test Vectors

Layer / File(s) Summary
Phase 5 ROAST benchmark suite
benches/phase5_roast.rs
Criterion benchmarks exercising DKG, start/finalize sign-round, state reload/recovery, and attempt transitions (timeout/invalid-share/stale-replay); includes FFI helpers, hashing/fingerprinting utilities, deterministic coordinator probing, and error validation.
Taproot signature fraud formal verification
tests/p2tr_signature_fraud_vectors.rs
Integration test loading p2tr-signature-fraud-v0.json vectors, computing BIP-341 key-path sighashes, verifying Schnorr signatures, deriving dual challenge identities, and asserting mutation-based identity uniqueness.
ROAST attempt-context test vectors
test/vectors/roast-attempt-context-v1.json
JSON schema and fixtures for deterministic fingerprint/attempt-id computation across multiple participant sets and edge cases, used by formal verification checkers.

TLA+ Formal Verification Models

Layer / File(s) Summary
ROAST attempt state machine model
docs/formal/models/RoastAttemptStateMachine.tla, .cfg
TLA+ module defining attempt numbering, active participants, deterministic coordinator, consumed-attempt registries (transient and durable), transitions (advance/restart/reload), and safety invariants for monotonicity and replay safety.
ROAST rollout policy state machine
docs/formal/models/RoastRolloutPolicy.tla, .cfg
TLA+ module specifying five-stage rollout progression (bootstrap→canary→broad→rollback↔halted) with control signals and temporal properties governing promotion/hold/rollback and emergency stop.
State-key provider policy & production gating model
docs/formal/models/StateKeyProviderPolicy.tla, .cfg, .production.cfg
TLA+ module modeling provider/key selection lifecycle and load outcomes; production-gate enforcement; properties asserting fail-closed behavior and exact binding enforcement.
TEE enforcement modes state machine
docs/formal/models/TeeEnforcementModes.tla, .cfg
TLA+ module for TE admission control with three enforcement modes, attestation states, break-glass override, and properties enforcing valid attestation and admission stability.
Formal models documentation & runner
docs/formal/models/README.md, scripts/formal/run_tla_models.sh
README documenting model coverage and bounds; Bash runner downloading/verifying TLA+ toolchain and executing TLC model checks.

Design Documentation & Specifications

Layer / File(s) Summary
ROAST protocol phase specifications
docs/roast-phase-{0-spec-freeze,1.5-consumed-registry,2-coordinator,3-transcript,4-liveness,5-baseline,5-runbook,5-security-gates}.md
Phase 0 spec defining attempt-context, coordinator validation, strict-mode gating, and error taxonomy; Phases 1.5–5 covering consumed-registry integration, coordinator enforcement, transcript replay hardening, liveness policy with recovery classification, baseline calibration, rollout runbook, and security gates.
ROAST implementation roadmap
docs/roast-implementation-plan.md
Complete ROAST migration plan covering baseline, goals/non-goals, design principles, threat model, target semantics (deterministic attempt/coordinator, authenticated retry, replay-safety), phased deliverables (Phase 0–5), cross-repo dependencies, and Definition-of-Done.
Hardening, security & operational plans
docs/permissioned-signer-hardening-rfc.md, docs/tbtc-signer-secret-material-hardening-plan.md, docs/tee-whitelisted-signer-enforcement-plan.md
RFC with P0–P2 hardening phases; secret-material plan (encrypted-at-rest XChaCha20-Poly1305 envelope); TEE operator enforcement covering admission tokens, runtime behavior, break-glass, and failure modes.
API contract and design decisions
docs/signer-api-contract-decision-brief.md, docs/rust-rewrite-bootstrap.md
Decision brief comparing round-level vs coarse session API (recommending session); bootstrap status documenting implemented features and production gates.
Future considerations documentation
docs/true-late-t_of_n_finalize_considerations.md
Discussion draft on true late t-of-n finalize tradeoffs and current recommendation to defer as future consideration.

Build Configuration & Scripts

Layer / File(s) Summary
Crate metadata and build setup
pkg/tbtc/signer/Cargo.toml, pkg/tbtc/signer/build.sh, pkg/tbtc/signer/.gitignore
Cargo.toml defines tbtc-signer crate (edition 2021, MIT), library config (cdylib+rlib), bench-restart-hook feature, frost-secp256k1-tr dependency; build.sh with strict bash and release cargo build; .gitignore ignores target/.
Comprehensive README documentation
pkg/tbtc/signer/README.md
Full README covering exposed C ABI endpoints, not-yet-implemented items, build/test commands, admission checker CLI, encrypted state provider contracts, benchmark/chaos suite invocation, FFI JSON contract, error codes, and buffer management.
Admission checker sample configurations
pkg/tbtc/signer/scripts/admission-*.sample.json
JSON samples for policy (constraints, DAO overrides), candidates, existing operators, override signatures, and replay registries.
Formal verification & testing scripts
pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs, pkg/tbtc/signer/scripts/run_phase5_chaos_suite.sh, pkg/tbtc/signer/scripts/formal/run_tla_models.sh
Node.js CLI for ROAST vector conformance; Bash runners for chaos suite scenarios and TLA+ model checking.

🎯 4 (Complex) | ⏱️ ~75 minutes

🐰 A bootstrap springs forth, with FFI calls so bright,
Formal models carved in TLA+ light,
ROAST coordinators dancing in Go-lang's stride,
Secrets encrypted with cryptographic pride,
From signer to harness, one system unified!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.28% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly summarizes the main change: adding the FROST/ROAST Rust signer under tbtc/signer.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch extraction/frost-signer-mirror-2026-05-26

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

…ripts

Per extraction plan v38 §4.4, allowlisted-divergence files require
content normalization for the canonical context. This commit applies
the transformations declared in the source manifest for the 2 signer-
side allowlisted-divergence entries.

Transformations
- pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs:
  Rewrite vector path from
    `docs/frost-migration/test-vectors/roast-attempt-context-v1.json`
  to canonical signer layout
    `test/vectors/roast-attempt-context-v1.json`
  (both relative to rootDir which is two levels up from the script
  location; in canonical context that's pkg/tbtc/signer/)

- pkg/tbtc/signer/scripts/formal/run_tla_models.sh:
  Rewrite MODEL_DIR default from
    `$ROOT_DIR/docs/frost-migration/formal-verification/models`
  to canonical signer layout
    `$ROOT_DIR/docs/formal/models`
  Plus MODELS_PATH env-var override for alternate environments (CI
  matrices, local dev trees). ROOT_DIR is unchanged
  (`$(dirname $BASH_SOURCE)/../..` resolves to pkg/tbtc/signer/ here).

Verification
- Both files retain identical behavior to their monorepo counterparts
  when invoked from canonical signer layout
- Comments added documenting the path normalization with reference back
  to the source manifest's allowlisted-divergence status

Recompute expectedTargetSha256 for both entries in the source manifest
and collect dual signoff before merge per plan v38 §4.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (3)
pkg/tbtc/signer/src/api.rs (1)

196-202: 💤 Low value

Missing #[serde(default, skip_serializing_if)] on script_tree_hex.

All other Option<T> fields in this file use #[serde(default, skip_serializing_if = "Option::is_none")], but script_tree_hex does not. This inconsistency means the field will serialize as null when absent, rather than being omitted.

Suggested fix
 pub struct BuildTaprootTxRequest {
     pub session_id: String,
     pub inputs: Vec<TxInput>,
     pub outputs: Vec<TxOutput>,
+    #[serde(default, skip_serializing_if = "Option::is_none")]
     pub script_tree_hex: Option<String>,
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tbtc/signer/src/api.rs` around lines 196 - 202, The BuildTaprootTxRequest
struct's script_tree_hex Option field lacks the serde attributes used elsewhere;
update the declaration of BuildTaprootTxRequest so the script_tree_hex field is
annotated with #[serde(default, skip_serializing_if = "Option::is_none")] to
match other Option<T> fields (preserving Clone/Debug/Deserialize/Serialize
behavior) so it is omitted from serialized output when None instead of
serializing as null.
pkg/tbtc/signer/src/lib.rs (1)

391-421: 💤 Low value

std::env::set_var and std::env::remove_var are not thread-safe.

These functions are unsound in multi-threaded contexts and deprecated since Rust 1.66. While the tests appear to serialize access via lock_test_state(), this guard must be held across all env mutations and checks within a test to prevent races with parallel test threads.

Current usage appears safe given the locking pattern, but this is fragile. Consider using a dedicated test configuration mechanism that doesn't rely on process-wide environment mutation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tbtc/signer/src/lib.rs` around lines 391 - 421, EnvVarGuard's methods
(EnvVarGuard::set, EnvVarGuard::unset) and its Drop rely on
std::env::set_var/remove_var which are process-wide and not thread-safe; replace
this pattern with a test-scoped, non-global solution such as using a crate that
provides scoped environment variables (e.g., temp_env or similar) or refactor
tests to accept an injected configuration object instead of mutating process
env; update usages to acquire and hold the new scoped guard for the entire
duration of tests that need env changes (or pass a Config struct into functions
under test) and remove direct calls to std::env::set_var/remove_var and the
EnvVarGuard Drop behavior to avoid races.
pkg/tbtc/signer/src/bin/admission_checker.rs (1)

257-276: 💤 Low value

Consider cleaning up the temp file if rename fails.

If fs::rename fails (e.g., cross-filesystem move or permissions issue), the temp file remains on disk. Adding a cleanup attempt in the error path would improve robustness.

♻️ Proposed cleanup on error
     fs::write(&tmp_path, serialized).map_err(|error| {
         format!(
             "failed to write override replay registry temp file [{}]: {error}",
             tmp_path.display()
         )
     })?;
-    fs::rename(&tmp_path, path).map_err(|error| {
-        format!(
-            "failed to persist override replay registry [{}]: {error}",
-            path.display()
-        )
-    })
+    fs::rename(&tmp_path, path).map_err(|error| {
+        let _ = fs::remove_file(&tmp_path); // Best-effort cleanup
+        format!(
+            "failed to persist override replay registry [{}]: {error}",
+            path.display()
+        )
+    })
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/tbtc/signer/src/bin/admission_checker.rs` around lines 257 - 276,
persist_override_replay_registry currently leaves the temporary file (tmp_path)
if fs::rename fails; modify the rename error path to attempt cleanup of tmp_path
before returning the error. Specifically, call fs::remove_file(&tmp_path)
(ignoring or logging its result) inside the Err branch that handles the rename
failure so the function still returns the original formatted error for
fs::rename but also tries to remove the leftover tmp file; reference
persist_override_replay_registry, path, tmp_path, and fs::rename when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/tbtc/signer/docs/formal/models/README.md`:
- Around line 32-51: Update the incorrect repository paths in the traceability
matrix of pkg/tbtc/signer/docs/formal/models/README.md so links point to the
actual implementation and docs in this repo: change references to
tools/tbtc-signer/src/engine.rs to pkg/tbtc/signer/src/engine.rs for the entries
mentioning RoastAttemptStateMachine.tla (validate_attempt_context, replay
guards) and StateKeyProviderPolicy.tla (decode_encrypted_state_envelope,
encode_encrypted_state_envelope); change
docs/frost-migration/tee-whitelisted-signer-enforcement-plan.md to
pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md for
TeeEnforcementModes.tla; and change
docs/frost-migration/roast-phase-5-security-rollout-gates.md to
pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md for
RoastRolloutPolicy.tla so readers can locate the referenced code and policy
docs.

In `@pkg/tbtc/signer/docs/roast-implementation-plan.md`:
- Line 93: Update the broken doc links in
pkg/tbtc/signer/docs/roast-implementation-plan.md by replacing repo-external
paths like `docs/frost-migration/roast-phase-0-spec-freeze.md` and any
`tools/tbtc-signer/...` references with the correct repo-local paths under
pkg/tbtc/signer/docs (or use correct relative paths from this markdown file);
search the file for all occurrences of `docs/frost-migration/...` and
`tools/tbtc-signer/...` (including the instances similar to the shown
`docs/frost-migration/roast-phase-0-spec-freeze.md`) and normalize each link so
it points to the new location in this package, preserving anchor fragments and
updating any link text as needed.

In `@pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md`:
- Around line 26-29: Update the incorrect crate path used in the runbook
commands: replace occurrences of "cd tools/tbtc-signer" with "cd
pkg/tbtc/signer" for the benchmark command (`cargo bench --features
bench-restart-hook --bench phase5_roast`) and the chaos suite script invocation
(`./scripts/run_phase5_chaos_suite.sh`) so the commands run from the correct
crate directory.

In `@pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md`:
- Around line 92-99: Update stale repository paths in
roast-phase-5-security-rollout-gates.md: replace any occurrences of the old
tooling path "tools/tbtc-signer" with the new location "pkg/tbtc/signer" (e.g.,
update the run command `cd tools/tbtc-signer && cargo bench --features
bench-restart-hook --bench phase5_roast` to `cd pkg/tbtc/signer ...`), and
update any links referencing
`docs/frost-migration/roast-phase-5-baseline-calibration.md` to the document’s
new location in the repo (search for the exact link text and swap to the correct
path). Ensure all instances at the reported locations (around the run command
and the listed links) are changed consistently so operators following the
runbook hit the correct files and commands.

In `@pkg/tbtc/signer/docs/rust-rewrite-bootstrap.md`:
- Around line 10-13: The documentation still references the old crate path
"tools/tbtc-signer" and the validation command using that path; update every
occurrence to "pkg/tbtc/signer" in rust-rewrite-bootstrap.md (including the
header lines that list the crate, the C ABI include path `include/frost_tbtc.h`,
and any validation/build commands) so links and commands point to the colocated
pkg/tbtc/signer location; search for "tools/tbtc-signer" and replace with
"pkg/tbtc/signer" and verify the validation command and any examples reference
the new path.

In `@pkg/tbtc/signer/docs/signer-api-contract-decision-brief.md`:
- Around line 43-47: Update the two referenced paths in
signer-api-contract-decision-brief.md so they point to the mirrored crate
locations: replace `docs/frost-migration/rust-rewrite-bootstrap.md` with
`pkg/tbtc/signer/docs/frost-migration/rust-rewrite-bootstrap.md` and replace
`tools/tbtc-signer/src/lib.rs` with `pkg/tbtc/signer/src/lib.rs` (look for the
occurrences shown around the paragraph mentioning the bootstrap Rust crate and
file: `tools/tbtc-signer/src/lib.rs` and update those strings accordingly).

In `@pkg/tbtc/signer/docs/tbtc-signer-secret-material-hardening-plan.md`:
- Around line 6-7: The doc still uses the old crate path string
`tools/tbtc-signer`; update that scope reference to `pkg/tbtc/signer` throughout
the file (tbtc-signer-secret-material-hardening-plan.md) so the plan points to
the mirrored crate location, and scan for any other occurrences of
`tools/tbtc-signer` in this document to replace with `pkg/tbtc/signer`.

In `@pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md`:
- Around line 296-298: Update the two broken cross-doc links in
pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md (currently
referencing docs/frost-migration/roast-phase-5-security-rollout-gates.md and
docs/frost-migration/roast-phase-5-rollout-runbook.md on lines ~296–297) to
point to their correct locations inside this PR:
pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md and
pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md so cross-document
navigation resolves correctly.

In `@pkg/tbtc/signer/README.md`:
- Around line 53-54: Update the README.md occurrence(s) that reference the old
path string "tools/tbtc-signer" to the new crate location "pkg/tbtc/signer":
search for and replace that path in all command snippets, code blocks, and file
references (e.g., cargo build/cd commands and any path bullets) so every
instance uses "pkg/tbtc/signer" consistently; ensure both shell commands and
prose file paths are updated, and run a quick grep for "tools/tbtc-signer" to
confirm no remaining references.

In `@pkg/tbtc/signer/scripts/admission-policy-v1.sample.json`:
- Line 8: Replace the non-hex placeholder value for the JSON key
dao_override_trust_root_pubkey_hex with a syntactically valid 32-byte hex string
(64 lowercase hex characters) so sample files and copy/paste validation don't
break; update the sample value to something like a 64-character hex placeholder
and leave replacement guidance in the README or nearby comment explaining it
must be replaced with the real x-only pubkey hex.

In `@pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs`:
- Around line 15-18: vectorsPath currently resolves to
"docs/frost-migration/test-vectors/roast-attempt-context-v1.json" and will fail
because the vectors live under "test/vectors"; update the path.join call that
constructs vectorsPath (using rootDir and the filename) to point to
"test/vectors/roast-attempt-context-v1.json" so the script loads the correct
file; ensure the change is made where vectorsPath is declared and used in this
module.

In `@pkg/tbtc/signer/scripts/formal/run_tla_models.sh`:
- Around line 4-5: The MODEL_DIR assignment in run_tla_models.sh is pointing to
the wrong path; update the MODEL_DIR variable (currently computed relative to
ROOT_DIR) to "$ROOT_DIR/docs/formal/models" so the directory existence check in
the script (around the directory existence test near line 20) succeeds; modify
the MODEL_DIR definition in the script (look for the MODEL_DIR variable
assignment and references) to use the corrected path and ensure any subsequent
uses of MODEL_DIR still reference this updated variable.

In `@pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs`:
- Around line 375-376: The test constructs vectors_path using the vectors_path
variable in pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs which
currently joins
"../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json" relative
to CARGO_MANIFEST_DIR and points to a non-existent file; fix by either adding
the missing p2tr-signature-fraud-v0.json to the repo at that path or update the
vectors_path join to the correct relative path where the JSON actually lives
(adjust the "../" segments or point to the canonical test-vectors location),
ensuring the variable name vectors_path and its usage remain unchanged.

---

Nitpick comments:
In `@pkg/tbtc/signer/src/api.rs`:
- Around line 196-202: The BuildTaprootTxRequest struct's script_tree_hex Option
field lacks the serde attributes used elsewhere; update the declaration of
BuildTaprootTxRequest so the script_tree_hex field is annotated with
#[serde(default, skip_serializing_if = "Option::is_none")] to match other
Option<T> fields (preserving Clone/Debug/Deserialize/Serialize behavior) so it
is omitted from serialized output when None instead of serializing as null.

In `@pkg/tbtc/signer/src/bin/admission_checker.rs`:
- Around line 257-276: persist_override_replay_registry currently leaves the
temporary file (tmp_path) if fs::rename fails; modify the rename error path to
attempt cleanup of tmp_path before returning the error. Specifically, call
fs::remove_file(&tmp_path) (ignoring or logging its result) inside the Err
branch that handles the rename failure so the function still returns the
original formatted error for fs::rename but also tries to remove the leftover
tmp file; reference persist_override_replay_registry, path, tmp_path, and
fs::rename when making the change.

In `@pkg/tbtc/signer/src/lib.rs`:
- Around line 391-421: EnvVarGuard's methods (EnvVarGuard::set,
EnvVarGuard::unset) and its Drop rely on std::env::set_var/remove_var which are
process-wide and not thread-safe; replace this pattern with a test-scoped,
non-global solution such as using a crate that provides scoped environment
variables (e.g., temp_env or similar) or refactor tests to accept an injected
configuration object instead of mutating process env; update usages to acquire
and hold the new scoped guard for the entire duration of tests that need env
changes (or pass a Config struct into functions under test) and remove direct
calls to std::env::set_var/remove_var and the EnvVarGuard Drop behavior to avoid
races.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c0224eef-499a-42a2-a346-f06ef353278b

📥 Commits

Reviewing files that changed from the base of the PR and between 980587b and 335ce60.

⛔ Files ignored due to path filters (1)
  • pkg/tbtc/signer/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (48)
  • pkg/tbtc/signer/.gitignore
  • pkg/tbtc/signer/Cargo.toml
  • pkg/tbtc/signer/README.md
  • pkg/tbtc/signer/benches/phase5_roast.rs
  • pkg/tbtc/signer/build.sh
  • pkg/tbtc/signer/docs/formal/models/README.md
  • pkg/tbtc/signer/docs/formal/models/RoastAttemptStateMachine.cfg
  • pkg/tbtc/signer/docs/formal/models/RoastAttemptStateMachine.tla
  • pkg/tbtc/signer/docs/formal/models/RoastRolloutPolicy.cfg
  • pkg/tbtc/signer/docs/formal/models/RoastRolloutPolicy.tla
  • pkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.cfg
  • pkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.production.cfg
  • pkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.tla
  • pkg/tbtc/signer/docs/formal/models/TeeEnforcementModes.cfg
  • pkg/tbtc/signer/docs/formal/models/TeeEnforcementModes.tla
  • pkg/tbtc/signer/docs/permissioned-signer-hardening-rfc.md
  • pkg/tbtc/signer/docs/roast-implementation-plan.md
  • pkg/tbtc/signer/docs/roast-phase-0-spec-freeze.md
  • pkg/tbtc/signer/docs/roast-phase-1.5-consumed-registry-integration.md
  • pkg/tbtc/signer/docs/roast-phase-2-coordinator-policy-enforcement.md
  • pkg/tbtc/signer/docs/roast-phase-3-attempt-transcript-replay-hardening.md
  • pkg/tbtc/signer/docs/roast-phase-4-liveness-policy-recovery.md
  • pkg/tbtc/signer/docs/roast-phase-5-baseline-calibration.md
  • pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md
  • pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md
  • pkg/tbtc/signer/docs/rust-rewrite-bootstrap.md
  • pkg/tbtc/signer/docs/signer-api-contract-decision-brief.md
  • pkg/tbtc/signer/docs/tbtc-signer-secret-material-hardening-plan.md
  • pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md
  • pkg/tbtc/signer/docs/true-late-t-of-n-finalize-considerations.md
  • pkg/tbtc/signer/include/frost_tbtc.h
  • pkg/tbtc/signer/scripts/admission-candidate.sample.json
  • pkg/tbtc/signer/scripts/admission-existing.sample.json
  • pkg/tbtc/signer/scripts/admission-override-registry.sample.json
  • pkg/tbtc/signer/scripts/admission-override.sample.json
  • pkg/tbtc/signer/scripts/admission-policy-v1.sample.json
  • pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs
  • pkg/tbtc/signer/scripts/formal/run_tla_models.sh
  • pkg/tbtc/signer/scripts/run_phase5_chaos_suite.sh
  • pkg/tbtc/signer/src/api.rs
  • pkg/tbtc/signer/src/bin/admission_checker.rs
  • pkg/tbtc/signer/src/engine.rs
  • pkg/tbtc/signer/src/errors.rs
  • pkg/tbtc/signer/src/ffi.rs
  • pkg/tbtc/signer/src/go_math_rand.rs
  • pkg/tbtc/signer/src/lib.rs
  • pkg/tbtc/signer/test/vectors/roast-attempt-context-v1.json
  • pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs

Comment thread pkg/tbtc/signer/docs/formal/models/README.md
Comment thread pkg/tbtc/signer/docs/roast-implementation-plan.md Outdated
Comment thread pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md Outdated
Comment thread pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md Outdated
Comment thread pkg/tbtc/signer/docs/rust-rewrite-bootstrap.md Outdated
Comment thread pkg/tbtc/signer/README.md Outdated
Comment thread pkg/tbtc/signer/scripts/admission-policy-v1.sample.json Outdated
Comment thread pkg/tbtc/signer/scripts/formal/run_tla_models.sh Outdated
Comment thread pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs Outdated
Adds a focused workflow that runs the Rust signer's formal-invariant
test suite + TLA model checks. Moved from
threshold-network/tbtc-v2/.github/workflows/ci-formal-verification.yml
(jobs `signer-formal-invariants` + `tla-model-checks`) per extraction
plan v38 §3.1 — the signer code lives here at pkg/tbtc/signer/, not
in tbtc-v2, so the CI jobs that exercise it belong here too.

Jobs
- signer-formal-invariants: cargo test --manifest-path pkg/tbtc/signer/
  Cargo.toml formal_verification_ (filter to formal-only cases)
- tla-model-checks: pkg/tbtc/signer/scripts/formal/run_tla_models.sh
  (iterates over .cfg files in pkg/tbtc/signer/docs/formal/models/ and
  runs TLC against each; MODELS_PATH env var allows override per the
  path-normalization commit b84b574c on this branch)

Triggers
- pull_request on pkg/tbtc/signer/** changes + this workflow file
- schedule nightly at 05:23 UTC (mirrors monorepo's pattern of running
  formal invariants both on PRs and nightly)
- workflow_dispatch for manual runs

Related changes in companion PR threshold-network/tbtc-v2#971:
- Removed these jobs from canonical tbtc-v2's ci-formal-verification.yml
- Added a comment in that file pointing here

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/tbtc-signer-formal.yml:
- Around line 25-26: The checkout steps using actions/checkout@v4 persist git
credentials by default; update each Checkout step (the uses: actions/checkout@v4
entries) to add with: persist-credentials: false so credentials are not stored
in the runner after checkout. Ensure both occurrences of actions/checkout@v4 in
the workflow are modified accordingly.
- Line 26: Update the GitHub Actions workflow to pin action versions and harden
checkout credentials: replace occurrences of actions/checkout@v4 with
actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 and add with:
persist-credentials: false to both checkout steps that use checkout, replace
dtolnay/rust-toolchain@stable with
dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8, and replace
actions/setup-java@v4 with
actions/setup-java@c1e323688fd81a25caa38c78aa6df2d33d3e20d9 so the workflow pins
SHA-based commits and disables persisting credentials on checkout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 88553e79-6db4-4e95-98fe-00c56e1bc640

📥 Commits

Reviewing files that changed from the base of the PR and between bfd7658 and 551bd42.

📒 Files selected for processing (1)
  • .github/workflows/tbtc-signer-formal.yml

Comment thread .github/workflows/tbtc-signer-formal.yml Outdated
Comment thread .github/workflows/tbtc-signer-formal.yml Outdated
Resolves CI failures on PR #4005 (signer mirror):

1. TLA model checks: run_tla_models.sh lacked executable bit at
   canonical HEAD. CI ran the script directly (no `bash` prefix),
   which fails with `Permission denied`. Fixed via
   `git update-index --chmod=+x`.

2. Signer formal invariants: engine.rs's
   formal_verification_roast_attempt_context_shared_vectors_match_
   expected_values test referenced vectors at a path stale from
   the umbrella's docs/frost-migration/test-vectors/ layout. The
   manifest places the vector at the canonical-signer test/vectors/
   subdir (pkg/tbtc/signer/test/vectors/roast-attempt-context-v1.json
   per the source-to-target map). Updated the
   `PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(...)` argument from
   `../../docs/frost-migration/test-vectors/roast-attempt-context-v1.json`
   (umbrella-relative) to `test/vectors/roast-attempt-context-v1.json`
   (signer-CARGO_MANIFEST_DIR-relative, where the vector actually
   lives at canonical HEAD).

Verified locally:
- ls -l shows executable bit set on run_tla_models.sh
- engine.rs path now resolves to the correct mirror location
- Vector exists at pkg/tbtc/signer/test/vectors/roast-attempt-context-v1.json

Same fix needs to be applied to PR #4007 (stacked on #4005)
in a follow-up commit on its branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mswilkison added a commit that referenced this pull request May 26, 2026
PR #4005 signer formal invariants test
formal_verification_p2tr_signature_fraud_vectors_match_bitcoin_crate
was failing because:

1. The umbrella source manifest only mapped p2tr-signature-fraud-v0.json
   to the tbtc-v2 target (docs/test-vectors/). It was not mirrored to
   the keep-core (signer) target.

2. tests/p2tr_signature_fraud_vectors.rs referenced the vector at the
   umbrella-relative path
   `../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json`
   (CARGO_MANIFEST_DIR-relative -> repo-root + docs/frost-migration/...).
   That directory does not exist on canonical keep-core.

This is a structural omission in the manifest, not a divergence: the
cross-language vector test exists in both Solidity (tbtc-v2 side) and
Rust (signer side), and the vector is genuinely needed in both places
to verify cross-implementation consistency.

Fix
- Mirror p2tr-signature-fraud-v0.json (598 lines, byte-identical
  content from tbtc-v2 mirror at docs/test-vectors/) to
  pkg/tbtc/signer/test/vectors/p2tr-signature-fraud-v0.json.
- Update the test path in tests/p2tr_signature_fraud_vectors.rs from
  `../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json`
  to `test/vectors/p2tr-signature-fraud-v0.json` (CARGO_MANIFEST_DIR =
  pkg/tbtc/signer/, so test/vectors/... resolves correctly).

Two stale comment references remain in
- pkg/tbtc/signer/docs/roast-implementation-plan.md:265
- pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs:19

Both are comment-only doc pointers to the source layout; they do not
affect runtime. Left as-is to preserve the umbrella -> canonical
provenance trail.

Manifest update follows in a stacked PR on tlabs-xyz/tbtc#10:
- Add p2tr-signature-fraud-v0.json -> signer target mapping
  (test/vectors/p2tr-signature-fraud-v0.json).
- Reclassify tests/p2tr_signature_fraud_vectors.rs from mirror to
  allowlisted-divergence (path differs from umbrella).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on extraction/frost-signer-mirror-2026-05-26 / PR #4005. Adds
optional taproot_merkle_root_hex to start/finalize signing rounds, binds
it into request fingerprints and round IDs, signs and aggregates with
frost-secp256k1-tr Taproot tweaks, and verifies tweaked aggregates in
tests. Verification: cargo test in pkg/tbtc/signer.
Fixes two Codex-flagged P2 state-consistency holes in start_sign_round.

1. Persist failure left in-memory state diverged from durable state. After the
   fresh-round path mutated the session (consumed-replay markers, round state), a
   failed persist returned an error but the canonical idempotent serve then
   returned signature shares WITHOUT persisting -- so a restart could replay the
   round with no durable consumed marker. The idempotent cached serve now
   persists before serving when the round is not yet durable, tracked by a
   process-local SIGN_ROUND_PERSIST_PENDING marker; when the original persist
   already succeeded it still serves cached without persisting, preserving the
   'idempotent replay survives a state-key-provider outage' property
   build_taproot_tx relies on (rollback is impossible -- the transition clear
   zeroizes the prior round material).

2. Active attempt cleared before later validation could fail. On an authorized
   ROAST attempt advance, clear_active_sign_round_for_attempt_transition ran
   before the fresh-path checks (participant resolution, included-set equality,
   quarantine, consumed-replay, share construction). A malformed advance that
   passed authorization but failed a later check destroyed the in-memory active
   round with no validated/persisted replacement, so the next StartSignRound
   could start a fresh attempt without transition evidence until a restart. The
   clear is now deferred until every fallible check has passed, just before the
   replacement round is installed and persisted.

Adds two regression tests, both verified to fail against the pre-fix code. Design
validated with Codex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mswilkison and others added 4 commits June 28, 2026 10:05
Codex re-review: the SIGN_ROUND_PERSIST_PENDING marker was process-global but
cleared only at start_sign_round's own persist sites. If a start_sign_round
persist failed (marker set) and a later UNRELATED successful persist (e.g. a DKG
for another session) then wrote the whole engine state -- making that round
durable -- the marker stayed stale-true. A subsequent idempotent replay of the
now-durable round during a state-key-provider outage would re-enter the persist
branch, try to persist again, and fail instead of serving the cached round.

Move the marker into the persistence module and clear it inside
persist_engine_state_to_storage_with_key on any successful write, so any
operation's successful persist clears it. start_sign_round sets it on a
fresh-round mutation (mark_sign_round_persist_pending) and reads it in the
idempotent serve (sign_round_persist_pending); the explicit clears at the
start_sign_round persist sites are removed (the persist clears it now).

Adds a regression test (verified to fail against the pre-fix code) covering the
unrelated-persist-then-outage replay.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: the process-global SIGN_ROUND_PERSIST_PENDING bit conflated all
sessions. After one session's StartSignRound established a round and failed to
persist, the bit stayed set process-wide, so the idempotent cached-serve branch
-- which consulted it for EVERY session -- would force an UNRELATED, already-
durable session's replay to re-persist and fail during the same state-key outage
instead of serving its durable cached shares. An availability regression.

Replace the global AtomicBool with a per-session set
(SIGN_ROUND_PERSIST_PENDING_SESSIONS: OnceLock<Mutex<BTreeSet<String>>>) keyed by
session_id: mark the specific session on a fresh-round mutation, consult that
session in the cached-serve branch, and clear the WHOLE set on any successful
persist (a persist writes the entire engine state, so every in-memory round
becomes durable at once -- preserving the cross-operation durability the prior
commit established). reset_for_tests clears the set explicitly.

Both invariants stay intact: a non-durable round's replay still re-persists
before serving (durability); a durable round's replay still serves without
persisting (availability); a different session's failed persist no longer drags
down an unrelated durable session.

Adds a regression test (verified to fail against the global bool) and corrects
the marker doc comment (mutual exclusion is the inner mutex, not the ENGINE_STATE
guard, since clear also runs off-guard at startup and in test reset).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#4123)

## Summary

Fixes two Codex-flagged P2 state-consistency holes in `start_sign_round`
(`pkg/tbtc/signer/src/engine/signing.rs`), where a failed persist or a
later validation error could leave in-memory state diverged from durable
state and bypass the ROAST replay/transition protections.

### 1. Persist before serving the idempotent cached round (bug #1)

The fresh-round path mutates the session (consumed-replay markers +
round state) and then persists. If that persist failed
(state-key-provider or disk error) it returned an error and served no
shares — but the **canonical idempotent serve** then returned signature
shares **without persisting**, so a restart could replay the round with
no durable consumed marker.

Fix: the idempotent cached serve now persists before serving **when the
round is not yet durable**, tracked by a process-local
`SIGN_ROUND_PERSIST_PENDING` marker (set on fresh-round mutation,
cleared after a successful persist). When the original persist already
succeeded — the common case — it still serves the cached round
**without** persisting, preserving the "idempotent replay survives a
state-key-provider outage" property that `build_taproot_tx` relies on.
(Rollback is impossible here: the transition clear zeroizes the prior
round material.)

### 2. Defer clearing the active attempt until validation finishes (bug
#2)

On an authorized ROAST attempt advance,
`clear_active_sign_round_for_attempt_transition` ran **before** the
fresh-path checks (participant resolution, included-set equality,
quarantine, consumed-replay, share construction). A malformed advance
that passed authorization + the RFC-21 coordinator check but failed a
**later** check destroyed the in-memory active round with no
validated/persisted replacement — so `active_attempt_context` was
dropped and the next `StartSignRound` could start a fresh attempt
**without transition evidence** until a restart reloaded durable state.

Fix: the advance is authorized but the clear is **deferred** until every
fallible fresh-path check has passed, immediately before the replacement
round is installed and persisted (the idempotent/conflict branch is
skipped for an authorized advance via a flag).

## Tests

Two regression tests, both **verified to fail against the pre-fix
code**:
- `authorized_advance_failing_later_check_preserves_active_sign_round` —
an authorized advance that fails the included-set check leaves attempt 1
idempotently signable (pre-fix: `ConsumedAttemptReplay`).
-
`idempotent_sign_round_replay_persists_before_serving_after_persist_outage`
— a sign round whose persist fails does not serve shares on idempotent
replay until the state-key provider recovers (pre-fix: served shares
with the key down).

Full lib suite: **308 passing** (`cargo test --lib`), `cargo fmt
--check` clean. Design validated with Codex.

_Found during the Codex review of #4005._

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…riodic reshare

Three findings from the #4005 review (workflow review + Codex):

1. [HIGH] persistence: the legacy unencrypted plaintext state fallback was
   accepted unconditionally, bypassing the AEAD envelope -- anyone who could
   write the state file could forge it (cleared replay markers / attacker key
   material) without the state-encryption key. Per the secret-material hardening
   plan the plaintext path is now emergency-rollback-only: compile-time disabled
   in release builds, rejected in production profiles, and gated behind an
   explicit TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK opt-in. The two tests
   that load plaintext now opt in; a negative assertion pins the fail-closed
   default.

2. [Codex P1] ffi: free_buffer now zeroizes the buffer before deallocation, so
   secret material returned by nonce/DKG/key-package endpoints is wiped rather
   than left in allocator memory when a caller forgets to clear it.

3. [Codex P2] lifecycle: a subsequent same-session refresh with updated shares
   produced a different fingerprint and was rejected as SessionConflict, so a
   long-lived session could never advance refresh_history past the first refresh.
   A different fingerprint is now treated as a new periodic refresh; idempotent
   replay of the same fingerprint is unchanged.

Deferred (flagged, not in this change):
- Signing-policy firewall production force-on: force-enabling it makes the entire
  firewall policy config mandatory in production (a deployment-contract change);
  the verifier flagged it as needing product confirmation, so left as-is.
- State-at-rest anti-rollback freshness counter and quarantine-reset marker loss:
  design-level / inherent to the opt-in policy.

Verified: cargo test (297 passed, 0 failed) and cargo clippy -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… built-in defaults

Closes the fail-open default flagged in the #4005 review (and confirmed by Codex):
signing_policy_firewall_enforced() defaulted OFF with no production force-on --
unlike provenance_gate_enforced() -- so a production signer would sign any sighash
reaching the sign path with no check it corresponds to a policy-checked
build_taproot_tx.

Implements option (b) (default-on with conservative baked-in defaults), chosen over
a blunt force-on, which made the full policy config mandatory and would brick a
production signer that did not ship it:

- signing_policy_firewall_enforced() now force-enables in production (mirrors the
  provenance gate), so the firewall's primary control --
  enforce_signing_message_binding_to_policy_checked_build_tx -- runs in production.
- load_signing_policy_firewall_config() resolves missing policy env to conservative
  built-in defaults instead of refusing to boot:
  - allowed_script_classes defaults to the standard tBTC output forms
    {p2pkh, p2sh, p2wpkh, p2wsh, p2tr}; the classifier's "other" (non-standard)
    bucket is not in the set, so the firewall fails closed on unknown forms.
  - numeric caps default permissive (output count high-bounded, value caps to
    BITCOIN_MAX_MONEY_SATS) and are operator-tunable; a too-tight static cap would
    false-reject legitimate large redemptions/sweeps (the stale-policy risk).

Operators can still narrow any of these via the existing TBTC_SIGNER_POLICY_* env.
Non-production behavior is unchanged (opt-in via the enforce flag). Removed the now
unused parse_{usize,u64}_from_env_required helpers.

Tests: production force-on + built-in defaults; the rollback-on-policy-failure test
now triggers on an explicitly-invalid value (UTC window mismatch) since absent config
no longer fails. cargo test (299 passed) + clippy -D warnings + fmt all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mswilkison and others added 16 commits June 30, 2026 13:28
Follow-up to the #4124 review (multi-agent + Codex), which found regressions in
the periodic-refresh change and a gap in the FFI wipe:

1. Refresh stale-retry (CONFIRMED, Codex P2): removing SessionConflict made
   refresh re-execute on a replayed/older request -- re-deriving stale shares and
   bumping the epoch. Now a fingerprint already in refresh_history (accepted but
   no longer current) is rejected as a stale retry; a genuinely-new fingerprint
   still proceeds (repeatable periodic reshares). RefreshHistoryRecord gains an
   optional request_fingerprint (serde-default, backward compatible).

2. Unbounded refresh_history (CONFIRMED): the removed guard was the only bound.
   Cap per-session history at MAX_REFRESH_HISTORY (256), dropping oldest;
   strictly-increasing epochs preserved so continuity still holds.

3. FFI incomplete wipe (PLAUSIBLE): to_ffi_buffer's into_boxed_slice() shrinks and
   reallocates when capacity > len (serde_json over-allocates), freeing the
   original secret buffer un-wiped. Copy into an exact boxed slice and zeroize the
   source Vec so no un-wiped copy survives.

4. Quarantine schema-mismatch test exercised plaintext-refusal, not schema
   validation, after the plaintext gate landed: opt into the rollback flag like
   its sibling so it tests the intended path.

5. Plaintext-gate doc comment overclaimed "compile-time disabled in release";
   clarify the load-bearing guard is the runtime non-production check.

Verified: cargo test (298 passed) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…efaults

Self-review found the README stale after the firewall change: the policy caps were
documented as "required when firewall is enabled" (now resolved to built-in
defaults), the production force-enable wasn't documented, and "Policy gates default
to disabled" contradicted it. Corrected all three.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… built-in defaults (#4125)

## Summary

Closes the **signing-policy firewall fail-open default** flagged in the
#4005 review (and confirmed by a Codex consult).
`signing_policy_firewall_enforced()` defaulted OFF with no production
force-on — unlike `provenance_gate_enforced()` — so a production signer
would sign any sighash reaching the sign path with **no check it
corresponds to a policy-checked `build_taproot_tx`**. With no
slashing/staking in the model, that on-signer check is the enforcement,
not a backstop.

## Approach: option (b) — default-on with conservative built-in defaults

I prototyped the blunt force-on first; it made the *entire* firewall
policy config mandatory in production (the signer would refuse to boot
without it) — a deployment-contract cliff. Codex's recommendation (and
mine) was **default-on with conservative baked-in defaults** instead, so
this PR does that:

- **`signing_policy_firewall_enforced()`** force-enables in production
(mirrors the provenance gate). The firewall's primary control —
`enforce_signing_message_binding_to_policy_checked_build_tx` (the signed
digest must match a policy-checked tx) — now runs in production.
- **`load_signing_policy_firewall_config()`** resolves missing policy
env to conservative built-in defaults instead of erroring:
- `allowed_script_classes` → the standard tBTC output forms `{p2pkh,
p2sh, p2wpkh, p2wsh, p2tr}`. The classifier's `"other"` (non-standard)
bucket is **not** in the set, so the firewall **fails closed** on
unknown output forms.
- numeric caps → **permissive** (output count high-bounded, value caps
to `BITCOIN_MAX_MONEY_SATS`) and operator-tunable. A too-tight static
cap would false-reject legitimate large redemptions/sweeps — Codex's
"stale-policy" risk — and the message-binding is the real control, so
the defaults bound damage without breaking liveness.

Operators still narrow any knob via the existing `TBTC_SIGNER_POLICY_*`
env. **Non-production behavior is unchanged** (opt-in via the enforce
flag).

## Notes / follow-ups
- The script-class default is the meaningful, zero-false-reject security
default. The value/count defaults are intentionally permissive;
deployments should set tighter caps per wallet sizing (Codex's two risks
— stale policy and false confidence — argue for *versioned* policy +
keeping host/on-chain validation authoritative).
- Still worth confirming separately: whether the keep-core Go host
independently validates the signed message vs Bridge state before
driving the round (defense-in-depth vs sole-enforcement). That doesn't
block this change — it only raises the priority.

## Verification
- `cargo test` — **299 passed, 0 failed** (incl. new tests: production
force-on, built-in defaults; the rollback-on-policy-failure test now
triggers on an explicitly-invalid value since absent config no longer
fails).
- `cargo clippy --all-targets -- -D warnings` and `cargo fmt --check` —
**clean**.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Codex re-review: the rollback-path tests opt into the plaintext path, which
legacy_plaintext_state_permitted() compiles out in release (cfg!(debug_assertions)
is false), so under `cargo test --release` they fail at the post-acceptance
assertions. cfg-gate the three plaintext-acceptance tests to debug_assertions;
release always refuses plaintext before those assertions, so there is nothing for
them to cover there.

Verified: cargo test (298 passed, debug) + cargo test --release (295 passed; the 3
tests correctly compiled out) + clippy --release -D warnings clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: a host that installs an init-time config has signer_env_var read
the installed config, not the process environment, so the documented
TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK env flag had no effect for it -- the
emergency plaintext migration was impossible to enable via the configured-host
path. Add permit_plaintext_state_rollback to InitSignerConfigRequest and map it in
config_values_from_request, mirroring the other enforcement flags.

Verified: cargo test (298 passed) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: a session loaded from state written before
RefreshHistoryRecord.request_fingerprint deserializes its history records with
None, so the last accepted refresh fingerprint survives only in
session.refresh_request_fingerprint. The first post-upgrade periodic refresh
overwrote that value, so a delayed retry of the pre-upgrade refresh no longer
matched the history scan and was accepted as a new epoch instead of
SessionConflict.

Before overwriting refresh_request_fingerprint, backfill the previously-accepted
fingerprint onto the most-recent history record when it is not already tracked
(the legacy None case). Adds a regression test that simulates pre-upgrade state
(strip history fingerprints) and verifies a retry of the old refresh is rejected
after a new one.

Verified: cargo test (299 debug / 296 release) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: refresh_cadence_status reported refresh_count as
session.refresh_history.len(), which the MAX_REFRESH_HISTORY cap bounds -- so a
long-lived session that ran more than 256 refreshes showed a stuck count of 256
even as later refreshes succeeded.

Add a monotonic SessionState.refresh_count (and its PersistedSessionState mirror,
serde-default for back-compat, plus both conversions) incremented on each accepted
refresh and backfilled from the retained history length for legacy state; report
it from refresh_cadence_status instead of the pruned history length. Adds a
regression test that prunes history and asserts the reported count is the true
total.

Verified: cargo test (300 passed) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… load

Codex re-review: refresh_count is #[serde(default)], so state written before the
field existed deserializes it to 0 even when refresh_history holds accepted
refreshes -- and since refresh_cadence_status now reports refresh_count, an
upgraded node reported 0 until the next refresh. Backfill it from
refresh_history.len() in the Persisted->Session conversion (evaluated before
refresh_history is moved), with a regression test.

Verified: cargo test + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntine recheck)

Codex re-review of the signer FFI surface:

1. [P1] Secret request fields not zeroized (api.rs): SignShareRequest.nonces_hex
   and key_package_hex deserialized into plain Strings dropped without
   zeroization, retaining the one-time nonce / private key package in freed heap.
   Wrap both in Zeroizing<String> (wiped on drop) + a redacting Debug so the
   secrets no longer leak via {:?} either.

2. [P1] Panic hook defeats payload redaction (ffi.rs): catch_unwind does not
   suppress Rust's default panic hook, so a panic payload (path/config/secret)
   prints to stderr before being converted to the redacted ErrorResponse. Install
   a panic hook at the FFI boundary that redacts the payload outside development.

3. [P2] Quarantine not rechecked on cached-round reuse (signing.rs): the matched-
   fingerprint reuse branch returned a fresh share without the new-round path's
   enforce_not_quarantined_identifiers check, so a participant quarantined after
   the round opened could still obtain a share. Apply the check on reuse too.

Verified: cargo test (301 passed) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pty history

Codex re-review: legacy state written before refresh_history existed can carry
refresh_request_fingerprint/refresh_result with an EMPTY refresh_history. The
fingerprint backfill used refresh_history.last_mut(), which is None for empty
history, so the previous fingerprint was not recorded and a delayed retry of that
pre-upgrade refresh was re-executed as a new refresh.

When there is no record to backfill onto, synthesize one from the cached
refresh_result (prior epoch, keeping history strictly increasing) carrying the
previous fingerprint, so stale retries stay rejected. Adds a regression test for
the empty-history legacy shape.

Verified: cargo test (302 passed) + clippy -D warnings + fmt clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h env

Two Codex-flagged issues in the non-interactive signing path:

1. start_sign_round fingerprinted and derived the round id from the raw
   request.message_hex string. hex::decode accepts mixed casing, so two calls
   carrying identical message bytes but different hex casing derived different
   fingerprints/round ids and a semantically identical retry was rejected as a
   SessionConflict. Lowercase message_hex right after it validates as hex,
   mirroring the interactive path (interactive.rs), so the fingerprint and
   derive_round_id are casing-independent.

2. The README-documented `cargo bench --features bench-restart-hook --bench
   phase5_roast` failed in a clean shell: ensure_benchmark_environment never set
   TBTC_SIGNER_PROFILE (missing -> production) or TBTC_SIGNER_STATE_ENCRYPTION_KEY_HEX
   (required by the default `env` state-key provider), so the first RunDkg
   persist aborted. Seed profile=development, provider=env, and a fixed dev key.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prior commit lowercases message_hex before computing every canonical and
legacy fingerprint. That regressed the upgrade path: when a node upgrades with
an active sign round that a previous build started using mixed/uppercase
message_hex, the persisted sign_request_fingerprint was computed over that
original casing. The same retry then matched none of the fingerprints and fell
through to SessionConflict instead of reusing the cached round.

Capture the pre-canonicalization casing and, only when it differs from the
lowercased form, compute the four legacy fingerprint shapes over the original
casing and accept them in the cached-round match. A match migrates the stored
fingerprint to the canonical lowercase form, so the compatibility cost is paid
once per in-flight round. Adds a regression test that persists a mixed-case
fingerprint and asserts the retry reuses the round and migrates it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d result

The empty-history synthesize branch was guarded by `else if let Some(previous_result)`,
so a legacy/degraded state that kept refresh_request_fingerprint but whose
refresh_result deserialized to None (and whose refresh_history is empty) fell
through: neither the last_mut() backfill nor the synthesize branch ran. The next
refresh then overwrote the only copy of the old fingerprint, so a delayed retry
of the pre-upgrade request was treated as fresh and advanced the epoch — a replay.

Make the branch unconditional: synthesize a history record carrying the previous
fingerprint whether or not a cached result is present. Prefer the result's epoch
(when non-zero and below the new refresh) and share count; otherwise fall back to
refresh_epoch-1 (min 1) and a zero share count. refresh_epoch_counter is
persisted, so a prior accepted refresh implies refresh_epoch >= 2 and the fallback
stays non-zero and strictly below the new record, keeping history monotonic.

Adds a regression test (empty history + refresh_result=None) verified to fail
against the pre-fix code (the stale retry was re-executed instead of rejected).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…riodic reshare (#4124)

## Summary

Three findings from the #4005 review (the multi-agent custody review +
Codex's review), fixed and verified.

### 1. [HIGH] Plaintext state fallback was accepted unconditionally —
`persistence.rs`
`decode_persisted_state_storage_format` fell back to parsing an
**unencrypted, unauthenticated** `PersistedEngineState` whenever the
bytes weren't a valid AEAD envelope — with no gate at all. The AEAD
envelope is the only integrity control for state-at-rest, so anyone who
could write the state file could forge it (cleared `consumed_*` replay
markers → resume-into-double-sign, or an attacker `dkg_key_package` →
admission bypass) **without holding the state-encryption key**.

Per the secret-material hardening plan, the plaintext path is now
**emergency-rollback-only**: compile-time disabled in release builds
(`cfg!(debug_assertions)`), rejected in production profiles, and gated
behind an explicit `TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK` opt-in.
The two tests that load plaintext (migration + schema-mismatch) opt in,
and a negative assertion pins the fail-closed default.

### 2. [Codex P1] Secret FFI buffers left unwiped on free — `ffi.rs`
`free_buffer` now `zeroize`s the buffer before deallocation, so secret
material returned by nonce/DKG/key-package endpoints (e.g.
`frost_tbtc_generate_nonces_and_commitments`) is wiped rather than left
in allocator memory when a caller forgets to clear it. (`zeroize` is a
volatile wipe the optimizer can't elide.)

### 3. [Codex P2] Periodic reshare was a one-shot — `lifecycle.rs`
After the first `refresh_shares`, any later reshare with updated
`current_shares` had a different fingerprint and was rejected as
`SessionConflict`, so a long-lived session could never advance
`refresh_history`/`refresh_count`/cadence past the first refresh. A
different fingerprint is now treated as a new periodic refresh;
idempotent replay of the *same* fingerprint is unchanged.

## Deferred (flagged for the team)
- **Signing-policy firewall production force-on** (my MEDIUM finding):
mirroring the provenance gate is correct in principle, but
force-enabling the firewall makes its *entire* policy config (allowed
script classes, output caps, …) mandatory in production — a
deployment-contract change. The verifier flagged it as needing product
confirmation, so it's left as-is pending that decision.
- **State-at-rest anti-rollback freshness counter** and
**quarantine-reset replay-marker loss**: design-level (needs an external
monotonic anchor) / inherent to that opt-in policy (default is
fail-closed).

## Verification
- `cargo test` — **297 passed, 0 failed**.
- `cargo clippy --all-targets -- -D warnings` — **clean (exit 0)**.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…t credential persistence

Resolves the zizmor/CodeRabbit findings on the signer formal-verification
workflow: unpinned mutable action tags and persisted git credentials in
checkout steps. All `uses:` references now pin full commit SHAs with
human-readable version comments (actions/checkout v4.3.1,
dtolnay/rust-toolchain stable head, EmbarkStudios/cargo-deny-action
v2.0.20, actions/setup-java v4.8.0), and every checkout step sets
persist-credentials: false so the ephemeral token is not written to the
local git config.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
mswilkison and others added 4 commits July 1, 2026 16:33
…ction

The stateless FROST FFI primitives `generate_nonces_and_commitments` and
`sign_share` in `frost_ops.rs` hand a one-time signing nonce pair out to
the host and later accept it back as opaque `nonces_hex`. That leaves
nonce custody -- and the single-use invariant -- entirely with the caller
(the `SignShareRequest` contract even states the caller "is
cryptographically responsible for single use"). A host bug or compromise
that replays one `nonces_hex` across two distinct signing packages
produces two Schnorr shares under the same nonce, which algebraically
solves for the long-term secret share.

The deterministic StartSignRound/FinalizeSignRound path is already fenced
off in production by `enforce_transitional_signing_disabled_in_production`,
but these stateless primitives gated only on the provenance attestation
and had no production-profile guard -- a fail-closed asymmetry. A
production signer could therefore be driven through the host-custody nonce
path and inherit that catastrophic blast radius.

Add `enforce_stateless_nonce_primitives_disabled_in_production`, mirroring
the deterministic guard's style (same `signer_profile_is_production()`
predicate, same `EngineError::LifecyclePolicyRejected` variant and message
shape), and invoke it in both primitives immediately after the provenance
gate -- before any secret key package or nonce is deserialized. Under the
production profile both refuse with reason_code
`stateless_nonce_primitives_disabled_in_production`; production signing
must use the interactive FROST path (`interactive.rs`), where the engine
keeps nonce custody and enforces durable single-use consumption markers.
Non-production/test profiles retain current behavior.

No current production caller reaches these paths (defense-in-depth before
activation). Adds a focused test proving both primitives fail closed under
the production profile with valid provenance -- feeding the same secret
inputs that succeed under development -- and still work under the
development profile.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codex P2 review on #4127 flagged that SHA-pinning dtolnay/rust-toolchain could
make it install the SHA as a toolchain, since older versions of the action
derive the default toolchain from the action ref. Verified against the pinned
version (4be7066): its action.yml already defaults `toolchain` to `stable`, so
the Setup Rust / formal jobs were not actually broken (they pass on this
branch). Still, name the toolchain explicitly on both Setup Rust steps: it is
self-documenting and independent of the pinned action's default, which is the
safe form for a SHA-pinned toolchain action.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t credential persistence (#4127)

## Summary

Hardens `.github/workflows/tbtc-signer-formal.yml` (introduced in #4005)
against supply-chain and credential-exposure risks:

- Every `actions/checkout` step now sets `persist-credentials: false`,
so the ephemeral `GITHUB_TOKEN` is not persisted into the local git
config after checkout. None of the jobs perform authenticated git
operations after checkout, so nothing relies on the persisted
credential.
- Every `uses:` reference is pinned to a full 40-hex commit SHA with a
trailing comment recording the human-readable version, replacing mutable
tag/branch refs. No action major versions were changed — each pin is the
current commit behind the ref that was already in use.

This addresses the two open CodeRabbit review threads on #4005
(persisted checkout credentials + unpinned action refs on
`.github/workflows/tbtc-signer-formal.yml`).

## Pinned actions

| Action | Old ref | Pinned SHA (version) |
| --- | --- | --- |
| `actions/checkout` | `v4` | `34e114876b0b11c390a56381ad16ebd13914f8d5`
(v4.3.1) |
| `dtolnay/rust-toolchain` | `stable` |
`4be7066ada62dd38de10e7b70166bc74ed198c30` (stable branch head) |
| `EmbarkStudios/cargo-deny-action` | `v2` |
`bb137d7af7e4fb67e5f82a49c4fce4fad40782fe` (v2.0.20) |
| `actions/setup-java` | `v4` |
`c1e323688fd81a25caa38c78aa6df2d33d3e20d9` (v4.8.0) |

## Notes

- Stacked on `extraction/frost-signer-mirror-2026-05-26` (PR #4005);
only the workflow file is touched.
- YAML validated locally after the edit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…e production profile (#4129)

## Finding (MEDIUM, fail-closed asymmetry — defense-in-depth)

The stateless FROST FFI primitives `generate_nonces_and_commitments` and
`sign_share` in `pkg/tbtc/signer/src/engine/frost_ops.rs` hand a
one-time signing nonce pair out to the host and later accept it back as
opaque `nonces_hex`. Nonce custody — and the single-use invariant — is
left entirely with the caller. The `SignShareRequest` contract even
documents this: the caller *"is cryptographically responsible for single
use."*

A host bug or compromise that replays one `nonces_hex` across two
distinct signing packages yields two Schnorr shares under the same
nonce, which algebraically solves for the long-term secret share.
Catastrophic blast radius if reached.

The **deterministic** StartSignRound/FinalizeSignRound path is already
fenced off in production by
`enforce_transitional_signing_disabled_in_production`. The **stateless**
primitives, by contrast, gated only on the provenance attestation
(`enforce_provenance_gate`) and had **no** production-profile guard — a
fail-closed asymmetry. A production signer could therefore be driven
through the host-custody nonce path and inherit that blast radius.

No current production caller reaches these paths; this is
defense-in-depth before activation.

## Guard added

New
`enforce_stateless_nonce_primitives_disabled_in_production(operation)`
in `frost_ops.rs`, mirroring the deterministic guard's style:
- same predicate: `signer_profile_is_production()`
- same error type/shape: `EngineError::LifecyclePolicyRejected`
- reason_code: `stateless_nonce_primitives_disabled_in_production`

Under the production profile it fails closed (returns the error);
non-production/test profiles keep current behavior.

## The two call sites

Invoked immediately after `enforce_provenance_gate()?` — before any
secret key package or nonce is deserialized — in:
- `generate_nonces_and_commitments`
(`enforce_stateless_nonce_primitives_disabled_in_production("GenerateNoncesAndCommitments")`)
- `sign_share`
(`enforce_stateless_nonce_primitives_disabled_in_production("SignShare")`)

The deterministic path is unchanged.

## Test


`engine::tests::stateless_nonce_primitives_reject_under_production_profile`:
- Under the development profile, builds real key packages + nonces +
signing package and confirms both primitives succeed.
- Flips to the production profile (with valid provenance so the new gate
— not the provenance gate — is what fires) and feeds the **same** secret
inputs: both primitives return `LifecyclePolicyRejected` with
reason_code `stateless_nonce_primitives_disabled_in_production`, before
any secret is deserialized (the gate is the second statement of each
primitive).

## Validation (from `pkg/tbtc/signer`)

- `cargo fmt -- --check` — clean
- `cargo clippy --all-targets -- -D warnings` — clean
- `cargo test` — 319 passed (lib), 25 passed (admission_checker bin), 1
passed (p2tr integration), 0 failed

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants