feat(tbtc/signer): FROST/ROAST Rust signer#4005
Conversation
Lands the Rust signer at pkg/tbtc/signer/ alongside the existing Go DKG coordinator. Mirrors the signer slice of tlabs-xyz/tbtc:feat/frost-schnorr-migration (PR #10) at frozen tag frost-extraction-source-v1. Per extraction plan v38 §3.1, the signer co-locates with keep-core because: (a) HSM enforcement is external to signer code (standard PKCS#11/KMIP client), doesn't force a separate audit lifecycle; (b) coordinator coupling dominates (B-2 DKG coordinator already lives in this repo via PR #3866); (c) TEE adoption later is reversible without forcing a repo split. Layout - pkg/tbtc/signer/ — Rust crate with own Cargo.toml - pkg/tbtc/signer/docs/ — signer + ROAST + TEE specs - pkg/tbtc/signer/docs/formal/models/ — ROAST + TEE TLA+ models - pkg/tbtc/signer/scripts/formal/ — ROAST vector + TLA runner - pkg/tbtc/signer/test/vectors/ — roast-attempt-context-v1.json Files (49 total) - 47 mirror status (Rust source, signer docs, ROAST docs, TLA models, test vector, etc.) - 2 allowlisted-divergence: - pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs (path normalization to signer-repo paths) - pkg/tbtc/signer/scripts/formal/run_tla_models.sh (MODELS_PATH env var refactor; default pkg/tbtc/signer/docs/formal/models/) Provenance - Source repository: tlabs-xyz/tbtc - Source branch: feat/frost-schnorr-migration - Source tag (frozen): frost-extraction-source-v1 - Source commit (H): 52389bd5cccb5daeef195671feb7ca46be6e2f37 - Source manifest: extraction/frost-extraction-source-manifest.json (manifestSha256: f7295fb738104501eb6c0c2447a42122ceb5f684c7a7c5dfb50ecb0bde3a0ea0) - Source PR(s): #425 (tbtc-signer error codes) + the full FROST migration series; complete list per source manifest's commit range over tools/tbtc-signer/ and signer-adjacent docs/scripts. Build wiring (follow-up) This PR lands the signer source code, docs, vectors, and scripts. Rust toolchain CI integration (cargo build/test/clippy/fmt as a separate CI job alongside the existing Go jobs) is a follow-up — Go workspace ignores non-Go subdirectories automatically; the cargo crate at pkg/tbtc/signer/Cargo.toml is independently buildable. Known TBD (resolved pre-merge) - 2 allowlisted-divergence files have expectedTargetSha256 = <TBD> in the source manifest. Path normalization transformations need to be applied to make the scripts run in pkg/tbtc/signer/ context; this PR ships the source verbatim, follow-up commits on this branch apply the transformations before merge. - Dual signoff required on each allowlisted-divergence entry per plan v38 §4.2 (extraction lead + canonical repo maintainer). Verification (pre-merge per plan v38 §7.2) - 47 mirror files: sha256 equality between git show frost-extraction-source-v1:<sourcePath> and this PR's content at pkg/tbtc/signer/<targetPath>. - 2 allowlisted-divergence files: sha256 == expectedTargetSha256 (recorded post-transformation in source manifest). - PR-scoped rogue-file check: git diff --name-only main...HEAD only contains files in manifest's fileMap with targetKey "signer". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR introduces ChangesCore Signer Implementation
Admission Checking & Policy Enforcement
Benchmarking & Formal Test Vectors
TLA+ Formal Verification Models
Design Documentation & Specifications
Build Configuration & Scripts
🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ripts
Per extraction plan v38 §4.4, allowlisted-divergence files require
content normalization for the canonical context. This commit applies
the transformations declared in the source manifest for the 2 signer-
side allowlisted-divergence entries.
Transformations
- pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs:
Rewrite vector path from
`docs/frost-migration/test-vectors/roast-attempt-context-v1.json`
to canonical signer layout
`test/vectors/roast-attempt-context-v1.json`
(both relative to rootDir which is two levels up from the script
location; in canonical context that's pkg/tbtc/signer/)
- pkg/tbtc/signer/scripts/formal/run_tla_models.sh:
Rewrite MODEL_DIR default from
`$ROOT_DIR/docs/frost-migration/formal-verification/models`
to canonical signer layout
`$ROOT_DIR/docs/formal/models`
Plus MODELS_PATH env-var override for alternate environments (CI
matrices, local dev trees). ROOT_DIR is unchanged
(`$(dirname $BASH_SOURCE)/../..` resolves to pkg/tbtc/signer/ here).
Verification
- Both files retain identical behavior to their monorepo counterparts
when invoked from canonical signer layout
- Comments added documenting the path normalization with reference back
to the source manifest's allowlisted-divergence status
Recompute expectedTargetSha256 for both entries in the source manifest
and collect dual signoff before merge per plan v38 §4.2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (3)
pkg/tbtc/signer/src/api.rs (1)
196-202: 💤 Low valueMissing
#[serde(default, skip_serializing_if)]onscript_tree_hex.All other
Option<T>fields in this file use#[serde(default, skip_serializing_if = "Option::is_none")], butscript_tree_hexdoes not. This inconsistency means the field will serialize asnullwhen absent, rather than being omitted.Suggested fix
pub struct BuildTaprootTxRequest { pub session_id: String, pub inputs: Vec<TxInput>, pub outputs: Vec<TxOutput>, + #[serde(default, skip_serializing_if = "Option::is_none")] pub script_tree_hex: Option<String>, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tbtc/signer/src/api.rs` around lines 196 - 202, The BuildTaprootTxRequest struct's script_tree_hex Option field lacks the serde attributes used elsewhere; update the declaration of BuildTaprootTxRequest so the script_tree_hex field is annotated with #[serde(default, skip_serializing_if = "Option::is_none")] to match other Option<T> fields (preserving Clone/Debug/Deserialize/Serialize behavior) so it is omitted from serialized output when None instead of serializing as null.pkg/tbtc/signer/src/lib.rs (1)
391-421: 💤 Low value
std::env::set_varandstd::env::remove_varare not thread-safe.These functions are unsound in multi-threaded contexts and deprecated since Rust 1.66. While the tests appear to serialize access via
lock_test_state(), this guard must be held across all env mutations and checks within a test to prevent races with parallel test threads.Current usage appears safe given the locking pattern, but this is fragile. Consider using a dedicated test configuration mechanism that doesn't rely on process-wide environment mutation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tbtc/signer/src/lib.rs` around lines 391 - 421, EnvVarGuard's methods (EnvVarGuard::set, EnvVarGuard::unset) and its Drop rely on std::env::set_var/remove_var which are process-wide and not thread-safe; replace this pattern with a test-scoped, non-global solution such as using a crate that provides scoped environment variables (e.g., temp_env or similar) or refactor tests to accept an injected configuration object instead of mutating process env; update usages to acquire and hold the new scoped guard for the entire duration of tests that need env changes (or pass a Config struct into functions under test) and remove direct calls to std::env::set_var/remove_var and the EnvVarGuard Drop behavior to avoid races.pkg/tbtc/signer/src/bin/admission_checker.rs (1)
257-276: 💤 Low valueConsider cleaning up the temp file if rename fails.
If
fs::renamefails (e.g., cross-filesystem move or permissions issue), the temp file remains on disk. Adding a cleanup attempt in the error path would improve robustness.♻️ Proposed cleanup on error
fs::write(&tmp_path, serialized).map_err(|error| { format!( "failed to write override replay registry temp file [{}]: {error}", tmp_path.display() ) })?; - fs::rename(&tmp_path, path).map_err(|error| { - format!( - "failed to persist override replay registry [{}]: {error}", - path.display() - ) - }) + fs::rename(&tmp_path, path).map_err(|error| { + let _ = fs::remove_file(&tmp_path); // Best-effort cleanup + format!( + "failed to persist override replay registry [{}]: {error}", + path.display() + ) + }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/tbtc/signer/src/bin/admission_checker.rs` around lines 257 - 276, persist_override_replay_registry currently leaves the temporary file (tmp_path) if fs::rename fails; modify the rename error path to attempt cleanup of tmp_path before returning the error. Specifically, call fs::remove_file(&tmp_path) (ignoring or logging its result) inside the Err branch that handles the rename failure so the function still returns the original formatted error for fs::rename but also tries to remove the leftover tmp file; reference persist_override_replay_registry, path, tmp_path, and fs::rename when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/tbtc/signer/docs/formal/models/README.md`:
- Around line 32-51: Update the incorrect repository paths in the traceability
matrix of pkg/tbtc/signer/docs/formal/models/README.md so links point to the
actual implementation and docs in this repo: change references to
tools/tbtc-signer/src/engine.rs to pkg/tbtc/signer/src/engine.rs for the entries
mentioning RoastAttemptStateMachine.tla (validate_attempt_context, replay
guards) and StateKeyProviderPolicy.tla (decode_encrypted_state_envelope,
encode_encrypted_state_envelope); change
docs/frost-migration/tee-whitelisted-signer-enforcement-plan.md to
pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md for
TeeEnforcementModes.tla; and change
docs/frost-migration/roast-phase-5-security-rollout-gates.md to
pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md for
RoastRolloutPolicy.tla so readers can locate the referenced code and policy
docs.
In `@pkg/tbtc/signer/docs/roast-implementation-plan.md`:
- Line 93: Update the broken doc links in
pkg/tbtc/signer/docs/roast-implementation-plan.md by replacing repo-external
paths like `docs/frost-migration/roast-phase-0-spec-freeze.md` and any
`tools/tbtc-signer/...` references with the correct repo-local paths under
pkg/tbtc/signer/docs (or use correct relative paths from this markdown file);
search the file for all occurrences of `docs/frost-migration/...` and
`tools/tbtc-signer/...` (including the instances similar to the shown
`docs/frost-migration/roast-phase-0-spec-freeze.md`) and normalize each link so
it points to the new location in this package, preserving anchor fragments and
updating any link text as needed.
In `@pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md`:
- Around line 26-29: Update the incorrect crate path used in the runbook
commands: replace occurrences of "cd tools/tbtc-signer" with "cd
pkg/tbtc/signer" for the benchmark command (`cargo bench --features
bench-restart-hook --bench phase5_roast`) and the chaos suite script invocation
(`./scripts/run_phase5_chaos_suite.sh`) so the commands run from the correct
crate directory.
In `@pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md`:
- Around line 92-99: Update stale repository paths in
roast-phase-5-security-rollout-gates.md: replace any occurrences of the old
tooling path "tools/tbtc-signer" with the new location "pkg/tbtc/signer" (e.g.,
update the run command `cd tools/tbtc-signer && cargo bench --features
bench-restart-hook --bench phase5_roast` to `cd pkg/tbtc/signer ...`), and
update any links referencing
`docs/frost-migration/roast-phase-5-baseline-calibration.md` to the document’s
new location in the repo (search for the exact link text and swap to the correct
path). Ensure all instances at the reported locations (around the run command
and the listed links) are changed consistently so operators following the
runbook hit the correct files and commands.
In `@pkg/tbtc/signer/docs/rust-rewrite-bootstrap.md`:
- Around line 10-13: The documentation still references the old crate path
"tools/tbtc-signer" and the validation command using that path; update every
occurrence to "pkg/tbtc/signer" in rust-rewrite-bootstrap.md (including the
header lines that list the crate, the C ABI include path `include/frost_tbtc.h`,
and any validation/build commands) so links and commands point to the colocated
pkg/tbtc/signer location; search for "tools/tbtc-signer" and replace with
"pkg/tbtc/signer" and verify the validation command and any examples reference
the new path.
In `@pkg/tbtc/signer/docs/signer-api-contract-decision-brief.md`:
- Around line 43-47: Update the two referenced paths in
signer-api-contract-decision-brief.md so they point to the mirrored crate
locations: replace `docs/frost-migration/rust-rewrite-bootstrap.md` with
`pkg/tbtc/signer/docs/frost-migration/rust-rewrite-bootstrap.md` and replace
`tools/tbtc-signer/src/lib.rs` with `pkg/tbtc/signer/src/lib.rs` (look for the
occurrences shown around the paragraph mentioning the bootstrap Rust crate and
file: `tools/tbtc-signer/src/lib.rs` and update those strings accordingly).
In `@pkg/tbtc/signer/docs/tbtc-signer-secret-material-hardening-plan.md`:
- Around line 6-7: The doc still uses the old crate path string
`tools/tbtc-signer`; update that scope reference to `pkg/tbtc/signer` throughout
the file (tbtc-signer-secret-material-hardening-plan.md) so the plan points to
the mirrored crate location, and scan for any other occurrences of
`tools/tbtc-signer` in this document to replace with `pkg/tbtc/signer`.
In `@pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md`:
- Around line 296-298: Update the two broken cross-doc links in
pkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.md (currently
referencing docs/frost-migration/roast-phase-5-security-rollout-gates.md and
docs/frost-migration/roast-phase-5-rollout-runbook.md on lines ~296–297) to
point to their correct locations inside this PR:
pkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.md and
pkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.md so cross-document
navigation resolves correctly.
In `@pkg/tbtc/signer/README.md`:
- Around line 53-54: Update the README.md occurrence(s) that reference the old
path string "tools/tbtc-signer" to the new crate location "pkg/tbtc/signer":
search for and replace that path in all command snippets, code blocks, and file
references (e.g., cargo build/cd commands and any path bullets) so every
instance uses "pkg/tbtc/signer" consistently; ensure both shell commands and
prose file paths are updated, and run a quick grep for "tools/tbtc-signer" to
confirm no remaining references.
In `@pkg/tbtc/signer/scripts/admission-policy-v1.sample.json`:
- Line 8: Replace the non-hex placeholder value for the JSON key
dao_override_trust_root_pubkey_hex with a syntactically valid 32-byte hex string
(64 lowercase hex characters) so sample files and copy/paste validation don't
break; update the sample value to something like a 64-character hex placeholder
and leave replacement guidance in the README or nearby comment explaining it
must be replaced with the real x-only pubkey hex.
In `@pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs`:
- Around line 15-18: vectorsPath currently resolves to
"docs/frost-migration/test-vectors/roast-attempt-context-v1.json" and will fail
because the vectors live under "test/vectors"; update the path.join call that
constructs vectorsPath (using rootDir and the filename) to point to
"test/vectors/roast-attempt-context-v1.json" so the script loads the correct
file; ensure the change is made where vectorsPath is declared and used in this
module.
In `@pkg/tbtc/signer/scripts/formal/run_tla_models.sh`:
- Around line 4-5: The MODEL_DIR assignment in run_tla_models.sh is pointing to
the wrong path; update the MODEL_DIR variable (currently computed relative to
ROOT_DIR) to "$ROOT_DIR/docs/formal/models" so the directory existence check in
the script (around the directory existence test near line 20) succeeds; modify
the MODEL_DIR definition in the script (look for the MODEL_DIR variable
assignment and references) to use the corrected path and ensure any subsequent
uses of MODEL_DIR still reference this updated variable.
In `@pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs`:
- Around line 375-376: The test constructs vectors_path using the vectors_path
variable in pkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs which
currently joins
"../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json" relative
to CARGO_MANIFEST_DIR and points to a non-existent file; fix by either adding
the missing p2tr-signature-fraud-v0.json to the repo at that path or update the
vectors_path join to the correct relative path where the JSON actually lives
(adjust the "../" segments or point to the canonical test-vectors location),
ensuring the variable name vectors_path and its usage remain unchanged.
---
Nitpick comments:
In `@pkg/tbtc/signer/src/api.rs`:
- Around line 196-202: The BuildTaprootTxRequest struct's script_tree_hex Option
field lacks the serde attributes used elsewhere; update the declaration of
BuildTaprootTxRequest so the script_tree_hex field is annotated with
#[serde(default, skip_serializing_if = "Option::is_none")] to match other
Option<T> fields (preserving Clone/Debug/Deserialize/Serialize behavior) so it
is omitted from serialized output when None instead of serializing as null.
In `@pkg/tbtc/signer/src/bin/admission_checker.rs`:
- Around line 257-276: persist_override_replay_registry currently leaves the
temporary file (tmp_path) if fs::rename fails; modify the rename error path to
attempt cleanup of tmp_path before returning the error. Specifically, call
fs::remove_file(&tmp_path) (ignoring or logging its result) inside the Err
branch that handles the rename failure so the function still returns the
original formatted error for fs::rename but also tries to remove the leftover
tmp file; reference persist_override_replay_registry, path, tmp_path, and
fs::rename when making the change.
In `@pkg/tbtc/signer/src/lib.rs`:
- Around line 391-421: EnvVarGuard's methods (EnvVarGuard::set,
EnvVarGuard::unset) and its Drop rely on std::env::set_var/remove_var which are
process-wide and not thread-safe; replace this pattern with a test-scoped,
non-global solution such as using a crate that provides scoped environment
variables (e.g., temp_env or similar) or refactor tests to accept an injected
configuration object instead of mutating process env; update usages to acquire
and hold the new scoped guard for the entire duration of tests that need env
changes (or pass a Config struct into functions under test) and remove direct
calls to std::env::set_var/remove_var and the EnvVarGuard Drop behavior to avoid
races.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c0224eef-499a-42a2-a346-f06ef353278b
⛔ Files ignored due to path filters (1)
pkg/tbtc/signer/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (48)
pkg/tbtc/signer/.gitignorepkg/tbtc/signer/Cargo.tomlpkg/tbtc/signer/README.mdpkg/tbtc/signer/benches/phase5_roast.rspkg/tbtc/signer/build.shpkg/tbtc/signer/docs/formal/models/README.mdpkg/tbtc/signer/docs/formal/models/RoastAttemptStateMachine.cfgpkg/tbtc/signer/docs/formal/models/RoastAttemptStateMachine.tlapkg/tbtc/signer/docs/formal/models/RoastRolloutPolicy.cfgpkg/tbtc/signer/docs/formal/models/RoastRolloutPolicy.tlapkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.cfgpkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.production.cfgpkg/tbtc/signer/docs/formal/models/StateKeyProviderPolicy.tlapkg/tbtc/signer/docs/formal/models/TeeEnforcementModes.cfgpkg/tbtc/signer/docs/formal/models/TeeEnforcementModes.tlapkg/tbtc/signer/docs/permissioned-signer-hardening-rfc.mdpkg/tbtc/signer/docs/roast-implementation-plan.mdpkg/tbtc/signer/docs/roast-phase-0-spec-freeze.mdpkg/tbtc/signer/docs/roast-phase-1.5-consumed-registry-integration.mdpkg/tbtc/signer/docs/roast-phase-2-coordinator-policy-enforcement.mdpkg/tbtc/signer/docs/roast-phase-3-attempt-transcript-replay-hardening.mdpkg/tbtc/signer/docs/roast-phase-4-liveness-policy-recovery.mdpkg/tbtc/signer/docs/roast-phase-5-baseline-calibration.mdpkg/tbtc/signer/docs/roast-phase-5-rollout-runbook.mdpkg/tbtc/signer/docs/roast-phase-5-security-rollout-gates.mdpkg/tbtc/signer/docs/rust-rewrite-bootstrap.mdpkg/tbtc/signer/docs/signer-api-contract-decision-brief.mdpkg/tbtc/signer/docs/tbtc-signer-secret-material-hardening-plan.mdpkg/tbtc/signer/docs/tee-whitelisted-signer-enforcement-plan.mdpkg/tbtc/signer/docs/true-late-t-of-n-finalize-considerations.mdpkg/tbtc/signer/include/frost_tbtc.hpkg/tbtc/signer/scripts/admission-candidate.sample.jsonpkg/tbtc/signer/scripts/admission-existing.sample.jsonpkg/tbtc/signer/scripts/admission-override-registry.sample.jsonpkg/tbtc/signer/scripts/admission-override.sample.jsonpkg/tbtc/signer/scripts/admission-policy-v1.sample.jsonpkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjspkg/tbtc/signer/scripts/formal/run_tla_models.shpkg/tbtc/signer/scripts/run_phase5_chaos_suite.shpkg/tbtc/signer/src/api.rspkg/tbtc/signer/src/bin/admission_checker.rspkg/tbtc/signer/src/engine.rspkg/tbtc/signer/src/errors.rspkg/tbtc/signer/src/ffi.rspkg/tbtc/signer/src/go_math_rand.rspkg/tbtc/signer/src/lib.rspkg/tbtc/signer/test/vectors/roast-attempt-context-v1.jsonpkg/tbtc/signer/tests/p2tr_signature_fraud_vectors.rs
Adds a focused workflow that runs the Rust signer's formal-invariant test suite + TLA model checks. Moved from threshold-network/tbtc-v2/.github/workflows/ci-formal-verification.yml (jobs `signer-formal-invariants` + `tla-model-checks`) per extraction plan v38 §3.1 — the signer code lives here at pkg/tbtc/signer/, not in tbtc-v2, so the CI jobs that exercise it belong here too. Jobs - signer-formal-invariants: cargo test --manifest-path pkg/tbtc/signer/ Cargo.toml formal_verification_ (filter to formal-only cases) - tla-model-checks: pkg/tbtc/signer/scripts/formal/run_tla_models.sh (iterates over .cfg files in pkg/tbtc/signer/docs/formal/models/ and runs TLC against each; MODELS_PATH env var allows override per the path-normalization commit b84b574c on this branch) Triggers - pull_request on pkg/tbtc/signer/** changes + this workflow file - schedule nightly at 05:23 UTC (mirrors monorepo's pattern of running formal invariants both on PRs and nightly) - workflow_dispatch for manual runs Related changes in companion PR threshold-network/tbtc-v2#971: - Removed these jobs from canonical tbtc-v2's ci-formal-verification.yml - Added a comment in that file pointing here Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/tbtc-signer-formal.yml:
- Around line 25-26: The checkout steps using actions/checkout@v4 persist git
credentials by default; update each Checkout step (the uses: actions/checkout@v4
entries) to add with: persist-credentials: false so credentials are not stored
in the runner after checkout. Ensure both occurrences of actions/checkout@v4 in
the workflow are modified accordingly.
- Line 26: Update the GitHub Actions workflow to pin action versions and harden
checkout credentials: replace occurrences of actions/checkout@v4 with
actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 and add with:
persist-credentials: false to both checkout steps that use checkout, replace
dtolnay/rust-toolchain@stable with
dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8, and replace
actions/setup-java@v4 with
actions/setup-java@c1e323688fd81a25caa38c78aa6df2d33d3e20d9 so the workflow pins
SHA-based commits and disables persisting credentials on checkout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 88553e79-6db4-4e95-98fe-00c56e1bc640
📒 Files selected for processing (1)
.github/workflows/tbtc-signer-formal.yml
Resolves CI failures on PR #4005 (signer mirror): 1. TLA model checks: run_tla_models.sh lacked executable bit at canonical HEAD. CI ran the script directly (no `bash` prefix), which fails with `Permission denied`. Fixed via `git update-index --chmod=+x`. 2. Signer formal invariants: engine.rs's formal_verification_roast_attempt_context_shared_vectors_match_ expected_values test referenced vectors at a path stale from the umbrella's docs/frost-migration/test-vectors/ layout. The manifest places the vector at the canonical-signer test/vectors/ subdir (pkg/tbtc/signer/test/vectors/roast-attempt-context-v1.json per the source-to-target map). Updated the `PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(...)` argument from `../../docs/frost-migration/test-vectors/roast-attempt-context-v1.json` (umbrella-relative) to `test/vectors/roast-attempt-context-v1.json` (signer-CARGO_MANIFEST_DIR-relative, where the vector actually lives at canonical HEAD). Verified locally: - ls -l shows executable bit set on run_tla_models.sh - engine.rs path now resolves to the correct mirror location - Vector exists at pkg/tbtc/signer/test/vectors/roast-attempt-context-v1.json Same fix needs to be applied to PR #4007 (stacked on #4005) in a follow-up commit on its branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #4005 signer formal invariants test formal_verification_p2tr_signature_fraud_vectors_match_bitcoin_crate was failing because: 1. The umbrella source manifest only mapped p2tr-signature-fraud-v0.json to the tbtc-v2 target (docs/test-vectors/). It was not mirrored to the keep-core (signer) target. 2. tests/p2tr_signature_fraud_vectors.rs referenced the vector at the umbrella-relative path `../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json` (CARGO_MANIFEST_DIR-relative -> repo-root + docs/frost-migration/...). That directory does not exist on canonical keep-core. This is a structural omission in the manifest, not a divergence: the cross-language vector test exists in both Solidity (tbtc-v2 side) and Rust (signer side), and the vector is genuinely needed in both places to verify cross-implementation consistency. Fix - Mirror p2tr-signature-fraud-v0.json (598 lines, byte-identical content from tbtc-v2 mirror at docs/test-vectors/) to pkg/tbtc/signer/test/vectors/p2tr-signature-fraud-v0.json. - Update the test path in tests/p2tr_signature_fraud_vectors.rs from `../../docs/frost-migration/test-vectors/p2tr-signature-fraud-v0.json` to `test/vectors/p2tr-signature-fraud-v0.json` (CARGO_MANIFEST_DIR = pkg/tbtc/signer/, so test/vectors/... resolves correctly). Two stale comment references remain in - pkg/tbtc/signer/docs/roast-implementation-plan.md:265 - pkg/tbtc/signer/scripts/formal/check_roast_attempt_context_vectors.mjs:19 Both are comment-only doc pointers to the source layout; they do not affect runtime. Left as-is to preserve the umbrella -> canonical provenance trail. Manifest update follows in a stacked PR on tlabs-xyz/tbtc#10: - Add p2tr-signature-fraud-v0.json -> signer target mapping (test/vectors/p2tr-signature-fraud-v0.json). - Reclassify tests/p2tr_signature_fraud_vectors.rs from mirror to allowlisted-divergence (path differs from umbrella). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stacked on extraction/frost-signer-mirror-2026-05-26 / PR #4005. Adds optional taproot_merkle_root_hex to start/finalize signing rounds, binds it into request fingerprints and round IDs, signs and aggregates with frost-secp256k1-tr Taproot tweaks, and verifies tweaked aggregates in tests. Verification: cargo test in pkg/tbtc/signer.
Fixes two Codex-flagged P2 state-consistency holes in start_sign_round. 1. Persist failure left in-memory state diverged from durable state. After the fresh-round path mutated the session (consumed-replay markers, round state), a failed persist returned an error but the canonical idempotent serve then returned signature shares WITHOUT persisting -- so a restart could replay the round with no durable consumed marker. The idempotent cached serve now persists before serving when the round is not yet durable, tracked by a process-local SIGN_ROUND_PERSIST_PENDING marker; when the original persist already succeeded it still serves cached without persisting, preserving the 'idempotent replay survives a state-key-provider outage' property build_taproot_tx relies on (rollback is impossible -- the transition clear zeroizes the prior round material). 2. Active attempt cleared before later validation could fail. On an authorized ROAST attempt advance, clear_active_sign_round_for_attempt_transition ran before the fresh-path checks (participant resolution, included-set equality, quarantine, consumed-replay, share construction). A malformed advance that passed authorization but failed a later check destroyed the in-memory active round with no validated/persisted replacement, so the next StartSignRound could start a fresh attempt without transition evidence until a restart. The clear is now deferred until every fallible check has passed, just before the replacement round is installed and persisted. Adds two regression tests, both verified to fail against the pre-fix code. Design validated with Codex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: the SIGN_ROUND_PERSIST_PENDING marker was process-global but cleared only at start_sign_round's own persist sites. If a start_sign_round persist failed (marker set) and a later UNRELATED successful persist (e.g. a DKG for another session) then wrote the whole engine state -- making that round durable -- the marker stayed stale-true. A subsequent idempotent replay of the now-durable round during a state-key-provider outage would re-enter the persist branch, try to persist again, and fail instead of serving the cached round. Move the marker into the persistence module and clear it inside persist_engine_state_to_storage_with_key on any successful write, so any operation's successful persist clears it. start_sign_round sets it on a fresh-round mutation (mark_sign_round_persist_pending) and reads it in the idempotent serve (sign_round_persist_pending); the explicit clears at the start_sign_round persist sites are removed (the persist clears it now). Adds a regression test (verified to fail against the pre-fix code) covering the unrelated-persist-then-outage replay. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: the process-global SIGN_ROUND_PERSIST_PENDING bit conflated all sessions. After one session's StartSignRound established a round and failed to persist, the bit stayed set process-wide, so the idempotent cached-serve branch -- which consulted it for EVERY session -- would force an UNRELATED, already- durable session's replay to re-persist and fail during the same state-key outage instead of serving its durable cached shares. An availability regression. Replace the global AtomicBool with a per-session set (SIGN_ROUND_PERSIST_PENDING_SESSIONS: OnceLock<Mutex<BTreeSet<String>>>) keyed by session_id: mark the specific session on a fresh-round mutation, consult that session in the cached-serve branch, and clear the WHOLE set on any successful persist (a persist writes the entire engine state, so every in-memory round becomes durable at once -- preserving the cross-operation durability the prior commit established). reset_for_tests clears the set explicitly. Both invariants stay intact: a non-durable round's replay still re-persists before serving (durability); a durable round's replay still serves without persisting (availability); a different session's failed persist no longer drags down an unrelated durable session. Adds a regression test (verified to fail against the global bool) and corrects the marker doc comment (mutual exclusion is the inner mutex, not the ENGINE_STATE guard, since clear also runs off-guard at startup and in test reset). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#4123) ## Summary Fixes two Codex-flagged P2 state-consistency holes in `start_sign_round` (`pkg/tbtc/signer/src/engine/signing.rs`), where a failed persist or a later validation error could leave in-memory state diverged from durable state and bypass the ROAST replay/transition protections. ### 1. Persist before serving the idempotent cached round (bug #1) The fresh-round path mutates the session (consumed-replay markers + round state) and then persists. If that persist failed (state-key-provider or disk error) it returned an error and served no shares — but the **canonical idempotent serve** then returned signature shares **without persisting**, so a restart could replay the round with no durable consumed marker. Fix: the idempotent cached serve now persists before serving **when the round is not yet durable**, tracked by a process-local `SIGN_ROUND_PERSIST_PENDING` marker (set on fresh-round mutation, cleared after a successful persist). When the original persist already succeeded — the common case — it still serves the cached round **without** persisting, preserving the "idempotent replay survives a state-key-provider outage" property that `build_taproot_tx` relies on. (Rollback is impossible here: the transition clear zeroizes the prior round material.) ### 2. Defer clearing the active attempt until validation finishes (bug #2) On an authorized ROAST attempt advance, `clear_active_sign_round_for_attempt_transition` ran **before** the fresh-path checks (participant resolution, included-set equality, quarantine, consumed-replay, share construction). A malformed advance that passed authorization + the RFC-21 coordinator check but failed a **later** check destroyed the in-memory active round with no validated/persisted replacement — so `active_attempt_context` was dropped and the next `StartSignRound` could start a fresh attempt **without transition evidence** until a restart reloaded durable state. Fix: the advance is authorized but the clear is **deferred** until every fallible fresh-path check has passed, immediately before the replacement round is installed and persisted (the idempotent/conflict branch is skipped for an authorized advance via a flag). ## Tests Two regression tests, both **verified to fail against the pre-fix code**: - `authorized_advance_failing_later_check_preserves_active_sign_round` — an authorized advance that fails the included-set check leaves attempt 1 idempotently signable (pre-fix: `ConsumedAttemptReplay`). - `idempotent_sign_round_replay_persists_before_serving_after_persist_outage` — a sign round whose persist fails does not serve shares on idempotent replay until the state-key provider recovers (pre-fix: served shares with the key down). Full lib suite: **308 passing** (`cargo test --lib`), `cargo fmt --check` clean. Design validated with Codex. _Found during the Codex review of #4005._ Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…riodic reshare Three findings from the #4005 review (workflow review + Codex): 1. [HIGH] persistence: the legacy unencrypted plaintext state fallback was accepted unconditionally, bypassing the AEAD envelope -- anyone who could write the state file could forge it (cleared replay markers / attacker key material) without the state-encryption key. Per the secret-material hardening plan the plaintext path is now emergency-rollback-only: compile-time disabled in release builds, rejected in production profiles, and gated behind an explicit TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK opt-in. The two tests that load plaintext now opt in; a negative assertion pins the fail-closed default. 2. [Codex P1] ffi: free_buffer now zeroizes the buffer before deallocation, so secret material returned by nonce/DKG/key-package endpoints is wiped rather than left in allocator memory when a caller forgets to clear it. 3. [Codex P2] lifecycle: a subsequent same-session refresh with updated shares produced a different fingerprint and was rejected as SessionConflict, so a long-lived session could never advance refresh_history past the first refresh. A different fingerprint is now treated as a new periodic refresh; idempotent replay of the same fingerprint is unchanged. Deferred (flagged, not in this change): - Signing-policy firewall production force-on: force-enabling it makes the entire firewall policy config mandatory in production (a deployment-contract change); the verifier flagged it as needing product confirmation, so left as-is. - State-at-rest anti-rollback freshness counter and quarantine-reset marker loss: design-level / inherent to the opt-in policy. Verified: cargo test (297 passed, 0 failed) and cargo clippy -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… built-in defaults Closes the fail-open default flagged in the #4005 review (and confirmed by Codex): signing_policy_firewall_enforced() defaulted OFF with no production force-on -- unlike provenance_gate_enforced() -- so a production signer would sign any sighash reaching the sign path with no check it corresponds to a policy-checked build_taproot_tx. Implements option (b) (default-on with conservative baked-in defaults), chosen over a blunt force-on, which made the full policy config mandatory and would brick a production signer that did not ship it: - signing_policy_firewall_enforced() now force-enables in production (mirrors the provenance gate), so the firewall's primary control -- enforce_signing_message_binding_to_policy_checked_build_tx -- runs in production. - load_signing_policy_firewall_config() resolves missing policy env to conservative built-in defaults instead of refusing to boot: - allowed_script_classes defaults to the standard tBTC output forms {p2pkh, p2sh, p2wpkh, p2wsh, p2tr}; the classifier's "other" (non-standard) bucket is not in the set, so the firewall fails closed on unknown forms. - numeric caps default permissive (output count high-bounded, value caps to BITCOIN_MAX_MONEY_SATS) and are operator-tunable; a too-tight static cap would false-reject legitimate large redemptions/sweeps (the stale-policy risk). Operators can still narrow any of these via the existing TBTC_SIGNER_POLICY_* env. Non-production behavior is unchanged (opt-in via the enforce flag). Removed the now unused parse_{usize,u64}_from_env_required helpers. Tests: production force-on + built-in defaults; the rollback-on-policy-failure test now triggers on an explicitly-invalid value (UTC window mismatch) since absent config no longer fails. cargo test (299 passed) + clippy -D warnings + fmt all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the #4124 review (multi-agent + Codex), which found regressions in the periodic-refresh change and a gap in the FFI wipe: 1. Refresh stale-retry (CONFIRMED, Codex P2): removing SessionConflict made refresh re-execute on a replayed/older request -- re-deriving stale shares and bumping the epoch. Now a fingerprint already in refresh_history (accepted but no longer current) is rejected as a stale retry; a genuinely-new fingerprint still proceeds (repeatable periodic reshares). RefreshHistoryRecord gains an optional request_fingerprint (serde-default, backward compatible). 2. Unbounded refresh_history (CONFIRMED): the removed guard was the only bound. Cap per-session history at MAX_REFRESH_HISTORY (256), dropping oldest; strictly-increasing epochs preserved so continuity still holds. 3. FFI incomplete wipe (PLAUSIBLE): to_ffi_buffer's into_boxed_slice() shrinks and reallocates when capacity > len (serde_json over-allocates), freeing the original secret buffer un-wiped. Copy into an exact boxed slice and zeroize the source Vec so no un-wiped copy survives. 4. Quarantine schema-mismatch test exercised plaintext-refusal, not schema validation, after the plaintext gate landed: opt into the rollback flag like its sibling so it tests the intended path. 5. Plaintext-gate doc comment overclaimed "compile-time disabled in release"; clarify the load-bearing guard is the runtime non-production check. Verified: cargo test (298 passed) + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…efaults Self-review found the README stale after the firewall change: the policy caps were documented as "required when firewall is enabled" (now resolved to built-in defaults), the production force-enable wasn't documented, and "Policy gates default to disabled" contradicted it. Corrected all three. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… built-in defaults (#4125) ## Summary Closes the **signing-policy firewall fail-open default** flagged in the #4005 review (and confirmed by a Codex consult). `signing_policy_firewall_enforced()` defaulted OFF with no production force-on — unlike `provenance_gate_enforced()` — so a production signer would sign any sighash reaching the sign path with **no check it corresponds to a policy-checked `build_taproot_tx`**. With no slashing/staking in the model, that on-signer check is the enforcement, not a backstop. ## Approach: option (b) — default-on with conservative built-in defaults I prototyped the blunt force-on first; it made the *entire* firewall policy config mandatory in production (the signer would refuse to boot without it) — a deployment-contract cliff. Codex's recommendation (and mine) was **default-on with conservative baked-in defaults** instead, so this PR does that: - **`signing_policy_firewall_enforced()`** force-enables in production (mirrors the provenance gate). The firewall's primary control — `enforce_signing_message_binding_to_policy_checked_build_tx` (the signed digest must match a policy-checked tx) — now runs in production. - **`load_signing_policy_firewall_config()`** resolves missing policy env to conservative built-in defaults instead of erroring: - `allowed_script_classes` → the standard tBTC output forms `{p2pkh, p2sh, p2wpkh, p2wsh, p2tr}`. The classifier's `"other"` (non-standard) bucket is **not** in the set, so the firewall **fails closed** on unknown output forms. - numeric caps → **permissive** (output count high-bounded, value caps to `BITCOIN_MAX_MONEY_SATS`) and operator-tunable. A too-tight static cap would false-reject legitimate large redemptions/sweeps — Codex's "stale-policy" risk — and the message-binding is the real control, so the defaults bound damage without breaking liveness. Operators still narrow any knob via the existing `TBTC_SIGNER_POLICY_*` env. **Non-production behavior is unchanged** (opt-in via the enforce flag). ## Notes / follow-ups - The script-class default is the meaningful, zero-false-reject security default. The value/count defaults are intentionally permissive; deployments should set tighter caps per wallet sizing (Codex's two risks — stale policy and false confidence — argue for *versioned* policy + keeping host/on-chain validation authoritative). - Still worth confirming separately: whether the keep-core Go host independently validates the signed message vs Bridge state before driving the round (defense-in-depth vs sole-enforcement). That doesn't block this change — it only raises the priority. ## Verification - `cargo test` — **299 passed, 0 failed** (incl. new tests: production force-on, built-in defaults; the rollback-on-policy-failure test now triggers on an explicitly-invalid value since absent config no longer fails). - `cargo clippy --all-targets -- -D warnings` and `cargo fmt --check` — **clean**. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Codex re-review: the rollback-path tests opt into the plaintext path, which legacy_plaintext_state_permitted() compiles out in release (cfg!(debug_assertions) is false), so under `cargo test --release` they fail at the post-acceptance assertions. cfg-gate the three plaintext-acceptance tests to debug_assertions; release always refuses plaintext before those assertions, so there is nothing for them to cover there. Verified: cargo test (298 passed, debug) + cargo test --release (295 passed; the 3 tests correctly compiled out) + clippy --release -D warnings clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: a host that installs an init-time config has signer_env_var read the installed config, not the process environment, so the documented TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK env flag had no effect for it -- the emergency plaintext migration was impossible to enable via the configured-host path. Add permit_plaintext_state_rollback to InitSignerConfigRequest and map it in config_values_from_request, mirroring the other enforcement flags. Verified: cargo test (298 passed) + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: a session loaded from state written before RefreshHistoryRecord.request_fingerprint deserializes its history records with None, so the last accepted refresh fingerprint survives only in session.refresh_request_fingerprint. The first post-upgrade periodic refresh overwrote that value, so a delayed retry of the pre-upgrade refresh no longer matched the history scan and was accepted as a new epoch instead of SessionConflict. Before overwriting refresh_request_fingerprint, backfill the previously-accepted fingerprint onto the most-recent history record when it is not already tracked (the legacy None case). Adds a regression test that simulates pre-upgrade state (strip history fingerprints) and verifies a retry of the old refresh is rejected after a new one. Verified: cargo test (299 debug / 296 release) + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codex re-review: refresh_cadence_status reported refresh_count as session.refresh_history.len(), which the MAX_REFRESH_HISTORY cap bounds -- so a long-lived session that ran more than 256 refreshes showed a stuck count of 256 even as later refreshes succeeded. Add a monotonic SessionState.refresh_count (and its PersistedSessionState mirror, serde-default for back-compat, plus both conversions) incremented on each accepted refresh and backfilled from the retained history length for legacy state; report it from refresh_cadence_status instead of the pruned history length. Adds a regression test that prunes history and asserts the reported count is the true total. Verified: cargo test (300 passed) + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… load Codex re-review: refresh_count is #[serde(default)], so state written before the field existed deserializes it to 0 even when refresh_history holds accepted refreshes -- and since refresh_cadence_status now reports refresh_count, an upgraded node reported 0 until the next refresh. Backfill it from refresh_history.len() in the Persisted->Session conversion (evaluated before refresh_history is moved), with a regression test. Verified: cargo test + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntine recheck)
Codex re-review of the signer FFI surface:
1. [P1] Secret request fields not zeroized (api.rs): SignShareRequest.nonces_hex
and key_package_hex deserialized into plain Strings dropped without
zeroization, retaining the one-time nonce / private key package in freed heap.
Wrap both in Zeroizing<String> (wiped on drop) + a redacting Debug so the
secrets no longer leak via {:?} either.
2. [P1] Panic hook defeats payload redaction (ffi.rs): catch_unwind does not
suppress Rust's default panic hook, so a panic payload (path/config/secret)
prints to stderr before being converted to the redacted ErrorResponse. Install
a panic hook at the FFI boundary that redacts the payload outside development.
3. [P2] Quarantine not rechecked on cached-round reuse (signing.rs): the matched-
fingerprint reuse branch returned a fresh share without the new-round path's
enforce_not_quarantined_identifiers check, so a participant quarantined after
the round opened could still obtain a share. Apply the check on reuse too.
Verified: cargo test (301 passed) + clippy -D warnings + fmt clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pty history Codex re-review: legacy state written before refresh_history existed can carry refresh_request_fingerprint/refresh_result with an EMPTY refresh_history. The fingerprint backfill used refresh_history.last_mut(), which is None for empty history, so the previous fingerprint was not recorded and a delayed retry of that pre-upgrade refresh was re-executed as a new refresh. When there is no record to backfill onto, synthesize one from the cached refresh_result (prior epoch, keeping history strictly increasing) carrying the previous fingerprint, so stale retries stay rejected. Adds a regression test for the empty-history legacy shape. Verified: cargo test (302 passed) + clippy -D warnings + fmt clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h env Two Codex-flagged issues in the non-interactive signing path: 1. start_sign_round fingerprinted and derived the round id from the raw request.message_hex string. hex::decode accepts mixed casing, so two calls carrying identical message bytes but different hex casing derived different fingerprints/round ids and a semantically identical retry was rejected as a SessionConflict. Lowercase message_hex right after it validates as hex, mirroring the interactive path (interactive.rs), so the fingerprint and derive_round_id are casing-independent. 2. The README-documented `cargo bench --features bench-restart-hook --bench phase5_roast` failed in a clean shell: ensure_benchmark_environment never set TBTC_SIGNER_PROFILE (missing -> production) or TBTC_SIGNER_STATE_ENCRYPTION_KEY_HEX (required by the default `env` state-key provider), so the first RunDkg persist aborted. Seed profile=development, provider=env, and a fixed dev key. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The prior commit lowercases message_hex before computing every canonical and legacy fingerprint. That regressed the upgrade path: when a node upgrades with an active sign round that a previous build started using mixed/uppercase message_hex, the persisted sign_request_fingerprint was computed over that original casing. The same retry then matched none of the fingerprints and fell through to SessionConflict instead of reusing the cached round. Capture the pre-canonicalization casing and, only when it differs from the lowercased form, compute the four legacy fingerprint shapes over the original casing and accept them in the cached-round match. A match migrates the stored fingerprint to the canonical lowercase form, so the compatibility cost is paid once per in-flight round. Adds a regression test that persists a mixed-case fingerprint and asserts the retry reuses the round and migrates it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d result The empty-history synthesize branch was guarded by `else if let Some(previous_result)`, so a legacy/degraded state that kept refresh_request_fingerprint but whose refresh_result deserialized to None (and whose refresh_history is empty) fell through: neither the last_mut() backfill nor the synthesize branch ran. The next refresh then overwrote the only copy of the old fingerprint, so a delayed retry of the pre-upgrade request was treated as fresh and advanced the epoch — a replay. Make the branch unconditional: synthesize a history record carrying the previous fingerprint whether or not a cached result is present. Prefer the result's epoch (when non-zero and below the new refresh) and share count; otherwise fall back to refresh_epoch-1 (min 1) and a zero share count. refresh_epoch_counter is persisted, so a prior accepted refresh implies refresh_epoch >= 2 and the fallback stays non-zero and strictly below the new record, keeping history monotonic. Adds a regression test (empty history + refresh_result=None) verified to fail against the pre-fix code (the stale retry was re-executed instead of rejected). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…riodic reshare (#4124) ## Summary Three findings from the #4005 review (the multi-agent custody review + Codex's review), fixed and verified. ### 1. [HIGH] Plaintext state fallback was accepted unconditionally — `persistence.rs` `decode_persisted_state_storage_format` fell back to parsing an **unencrypted, unauthenticated** `PersistedEngineState` whenever the bytes weren't a valid AEAD envelope — with no gate at all. The AEAD envelope is the only integrity control for state-at-rest, so anyone who could write the state file could forge it (cleared `consumed_*` replay markers → resume-into-double-sign, or an attacker `dkg_key_package` → admission bypass) **without holding the state-encryption key**. Per the secret-material hardening plan, the plaintext path is now **emergency-rollback-only**: compile-time disabled in release builds (`cfg!(debug_assertions)`), rejected in production profiles, and gated behind an explicit `TBTC_SIGNER_PERMIT_PLAINTEXT_STATE_ROLLBACK` opt-in. The two tests that load plaintext (migration + schema-mismatch) opt in, and a negative assertion pins the fail-closed default. ### 2. [Codex P1] Secret FFI buffers left unwiped on free — `ffi.rs` `free_buffer` now `zeroize`s the buffer before deallocation, so secret material returned by nonce/DKG/key-package endpoints (e.g. `frost_tbtc_generate_nonces_and_commitments`) is wiped rather than left in allocator memory when a caller forgets to clear it. (`zeroize` is a volatile wipe the optimizer can't elide.) ### 3. [Codex P2] Periodic reshare was a one-shot — `lifecycle.rs` After the first `refresh_shares`, any later reshare with updated `current_shares` had a different fingerprint and was rejected as `SessionConflict`, so a long-lived session could never advance `refresh_history`/`refresh_count`/cadence past the first refresh. A different fingerprint is now treated as a new periodic refresh; idempotent replay of the *same* fingerprint is unchanged. ## Deferred (flagged for the team) - **Signing-policy firewall production force-on** (my MEDIUM finding): mirroring the provenance gate is correct in principle, but force-enabling the firewall makes its *entire* policy config (allowed script classes, output caps, …) mandatory in production — a deployment-contract change. The verifier flagged it as needing product confirmation, so it's left as-is pending that decision. - **State-at-rest anti-rollback freshness counter** and **quarantine-reset replay-marker loss**: design-level (needs an external monotonic anchor) / inherent to that opt-in policy (default is fail-closed). ## Verification - `cargo test` — **297 passed, 0 failed**. - `cargo clippy --all-targets -- -D warnings` — **clean (exit 0)**. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…t credential persistence Resolves the zizmor/CodeRabbit findings on the signer formal-verification workflow: unpinned mutable action tags and persisted git credentials in checkout steps. All `uses:` references now pin full commit SHAs with human-readable version comments (actions/checkout v4.3.1, dtolnay/rust-toolchain stable head, EmbarkStudios/cargo-deny-action v2.0.20, actions/setup-java v4.8.0), and every checkout step sets persist-credentials: false so the ephemeral token is not written to the local git config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ction The stateless FROST FFI primitives `generate_nonces_and_commitments` and `sign_share` in `frost_ops.rs` hand a one-time signing nonce pair out to the host and later accept it back as opaque `nonces_hex`. That leaves nonce custody -- and the single-use invariant -- entirely with the caller (the `SignShareRequest` contract even states the caller "is cryptographically responsible for single use"). A host bug or compromise that replays one `nonces_hex` across two distinct signing packages produces two Schnorr shares under the same nonce, which algebraically solves for the long-term secret share. The deterministic StartSignRound/FinalizeSignRound path is already fenced off in production by `enforce_transitional_signing_disabled_in_production`, but these stateless primitives gated only on the provenance attestation and had no production-profile guard -- a fail-closed asymmetry. A production signer could therefore be driven through the host-custody nonce path and inherit that catastrophic blast radius. Add `enforce_stateless_nonce_primitives_disabled_in_production`, mirroring the deterministic guard's style (same `signer_profile_is_production()` predicate, same `EngineError::LifecyclePolicyRejected` variant and message shape), and invoke it in both primitives immediately after the provenance gate -- before any secret key package or nonce is deserialized. Under the production profile both refuse with reason_code `stateless_nonce_primitives_disabled_in_production`; production signing must use the interactive FROST path (`interactive.rs`), where the engine keeps nonce custody and enforces durable single-use consumption markers. Non-production/test profiles retain current behavior. No current production caller reaches these paths (defense-in-depth before activation). Adds a focused test proving both primitives fail closed under the production profile with valid provenance -- feeding the same secret inputs that succeed under development -- and still work under the development profile. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codex P2 review on #4127 flagged that SHA-pinning dtolnay/rust-toolchain could make it install the SHA as a toolchain, since older versions of the action derive the default toolchain from the action ref. Verified against the pinned version (4be7066): its action.yml already defaults `toolchain` to `stable`, so the Setup Rust / formal jobs were not actually broken (they pass on this branch). Still, name the toolchain explicitly on both Setup Rust steps: it is self-documenting and independent of the pinned action's default, which is the safe form for a SHA-pinned toolchain action. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t credential persistence (#4127) ## Summary Hardens `.github/workflows/tbtc-signer-formal.yml` (introduced in #4005) against supply-chain and credential-exposure risks: - Every `actions/checkout` step now sets `persist-credentials: false`, so the ephemeral `GITHUB_TOKEN` is not persisted into the local git config after checkout. None of the jobs perform authenticated git operations after checkout, so nothing relies on the persisted credential. - Every `uses:` reference is pinned to a full 40-hex commit SHA with a trailing comment recording the human-readable version, replacing mutable tag/branch refs. No action major versions were changed — each pin is the current commit behind the ref that was already in use. This addresses the two open CodeRabbit review threads on #4005 (persisted checkout credentials + unpinned action refs on `.github/workflows/tbtc-signer-formal.yml`). ## Pinned actions | Action | Old ref | Pinned SHA (version) | | --- | --- | --- | | `actions/checkout` | `v4` | `34e114876b0b11c390a56381ad16ebd13914f8d5` (v4.3.1) | | `dtolnay/rust-toolchain` | `stable` | `4be7066ada62dd38de10e7b70166bc74ed198c30` (stable branch head) | | `EmbarkStudios/cargo-deny-action` | `v2` | `bb137d7af7e4fb67e5f82a49c4fce4fad40782fe` (v2.0.20) | | `actions/setup-java` | `v4` | `c1e323688fd81a25caa38c78aa6df2d33d3e20d9` (v4.8.0) | ## Notes - Stacked on `extraction/frost-signer-mirror-2026-05-26` (PR #4005); only the workflow file is touched. - YAML validated locally after the edit. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…e production profile (#4129) ## Finding (MEDIUM, fail-closed asymmetry — defense-in-depth) The stateless FROST FFI primitives `generate_nonces_and_commitments` and `sign_share` in `pkg/tbtc/signer/src/engine/frost_ops.rs` hand a one-time signing nonce pair out to the host and later accept it back as opaque `nonces_hex`. Nonce custody — and the single-use invariant — is left entirely with the caller. The `SignShareRequest` contract even documents this: the caller *"is cryptographically responsible for single use."* A host bug or compromise that replays one `nonces_hex` across two distinct signing packages yields two Schnorr shares under the same nonce, which algebraically solves for the long-term secret share. Catastrophic blast radius if reached. The **deterministic** StartSignRound/FinalizeSignRound path is already fenced off in production by `enforce_transitional_signing_disabled_in_production`. The **stateless** primitives, by contrast, gated only on the provenance attestation (`enforce_provenance_gate`) and had **no** production-profile guard — a fail-closed asymmetry. A production signer could therefore be driven through the host-custody nonce path and inherit that blast radius. No current production caller reaches these paths; this is defense-in-depth before activation. ## Guard added New `enforce_stateless_nonce_primitives_disabled_in_production(operation)` in `frost_ops.rs`, mirroring the deterministic guard's style: - same predicate: `signer_profile_is_production()` - same error type/shape: `EngineError::LifecyclePolicyRejected` - reason_code: `stateless_nonce_primitives_disabled_in_production` Under the production profile it fails closed (returns the error); non-production/test profiles keep current behavior. ## The two call sites Invoked immediately after `enforce_provenance_gate()?` — before any secret key package or nonce is deserialized — in: - `generate_nonces_and_commitments` (`enforce_stateless_nonce_primitives_disabled_in_production("GenerateNoncesAndCommitments")`) - `sign_share` (`enforce_stateless_nonce_primitives_disabled_in_production("SignShare")`) The deterministic path is unchanged. ## Test `engine::tests::stateless_nonce_primitives_reject_under_production_profile`: - Under the development profile, builds real key packages + nonces + signing package and confirms both primitives succeed. - Flips to the production profile (with valid provenance so the new gate — not the provenance gate — is what fires) and feeds the **same** secret inputs: both primitives return `LifecyclePolicyRejected` with reason_code `stateless_nonce_primitives_disabled_in_production`, before any secret is deserialized (the gate is the second statement of each primitive). ## Validation (from `pkg/tbtc/signer`) - `cargo fmt -- --check` — clean - `cargo clippy --all-targets -- -D warnings` — clean - `cargo test` — 319 passed (lib), 25 passed (admission_checker bin), 1 passed (p2tr integration), 0 failed 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Summary
Lands the Rust FROST/ROAST signer at
pkg/tbtc/signer/alongside the existing Go DKG coordinator. The initial import was extracted fromtlabs-xyz/tbtc:feat/frost-schnorr-migrationat frozen signed tagfrost-extraction-source-v1, then this PR applied targeted canonical-path, CI, and reviewer hardening follow-ups in the canonical keep-core repo.This is no longer a pure byte-for-byte mirror PR. The post-freeze changes are intentional canonical keep-core changes (canonical-path normalization, CI wiring, and reviewer hardening). After this PR lands,
threshold-network/keep-coreis the source of truth for the signer; the extracted monorepo tag is provenance for the initial import only.Source-PR Provenance
tlabs-xyz/tbtcfeat/frost-schnorr-migrationfrost-extraction-source-v1(signed annotated tag by maclane)52389bd5cccb5daeef195671feb7ca46be6e2f37The monorepo provenance above is used only to audit the initial import. Ongoing signer development, fixes, CI, and reviews happen in
threshold-network/keep-core.Why This Lives In keep-core
Per extraction plan v38 section 3.1, the signer co-locates with keep-core because:
Layout Introduced
CI Coverage
This PR now runs:
cargo fmt --manifest-path pkg/tbtc/signer/Cargo.toml -- --checkcargo clippy --manifest-path pkg/tbtc/signer/Cargo.toml --all-targets -- -D warningscargo test --manifest-path pkg/tbtc/signer/Cargo.tomlcargo test --manifest-path pkg/tbtc/signer/Cargo.toml formal_verification_pkg/tbtc/signer/scripts/formal/run_tla_models.shVerification
Latest local verification after review fixes:
cargo fmt --manifest-path pkg/tbtc/signer/Cargo.toml -- --checkcargo clippy --manifest-path pkg/tbtc/signer/Cargo.toml --all-targets -- -D warningsTBTC_SIGNER_STATE_PATH=/tmp/tbtc-signer-ci-state-local-4005.json cargo test --manifest-path pkg/tbtc/signer/Cargo.tomlcargo test --manifest-path pkg/tbtc/signer/Cargo.toml formal_verification_Latest GitHub CI on this branch is green, including
Signer Rust checks,Signer formal invariants, andTLA model checks.Relationship To PR #3866
PR #3866 lands the Go-side DKG coordinator. This PR lands the Rust-side signer. They are complementary protocol implementations:
PR #3866 and this PR can land independently in either order.
Plan Context
This is one of the mergeable mirror PRs comprising the FROST extraction, alongside tbtc-v2, tbtc-subgraph, and tbtc-v3-indexer mirror PRs.