Skip to content

Trail of Bits tBTC v2 AC Extension — security remediation#4126

Open
lrsaturnino wants to merge 18 commits into
feat/psbt-covenant-final-project-prfrom
fix/tob-tbtcacext-remediation
Open

Trail of Bits tBTC v2 AC Extension — security remediation#4126
lrsaturnino wants to merge 18 commits into
feat/psbt-covenant-final-project-prfrom
fix/tob-tbtcacext-remediation

Conversation

@lrsaturnino

@lrsaturnino lrsaturnino commented Jul 1, 2026

Copy link
Copy Markdown
Member

Applies the remediation for the Trail of Bits "tBTC v2 AC Extension" security assessment. Every in-scope finding whose target code lives in this repository is fixed in its own commit, tagged with its public report ID (TOB-TBTCACEXT-<n>) so each fix maps 1:1 to a finding, with regression test(s) added alongside each fix.

Base / stacking

This branch was created off the frozen review commit 235e8fd9, an ancestor of feat/psbt-covenant-final-project-pr (the reviewed PR, #3882). The PR therefore targets that branch so the diff is exactly the remediation (17 commits) rather than PR #3882's own changes. Because the feature branch has since advanced past the frozen commit (it now carries the deep-audit fixes in #3938), expect to reconcile against those later commits when this lands. Retarget to main once #3882 lands.

Findings remediated (17)

  • High: 1 (fail closed on covenant signing until the tBTC Bridge covenant fraud-defense is deployed), 2 (derive the P2WSH output script for the migration destination so deposits stay revealable), 27 (accumulate the withdrawn counter on revoked escrow withdrawals to stop repeated draining)
  • Medium: 5 (guard the doneSigners map on every access in the signing-done check), 6 (synchronize BackoffStrategy tick counters against concurrent retransmission ticks), 22 (scan past address spam when discovering SPV proof targets), 36 (do not archive wallets still inside the DKG approval window), 56 (deduplicate DKG results only after handling completes), 57 (deduplicate wallet closures only after archival succeeds), 65 (use a flat job directory so covenant-signer jobs persist across restarts), 87 (deauthorize archived and non-live wallets in covenant signing)
  • Low: 10 (read TxMaxFee from its own field in past redemption events), 26 (reject signing-done messages from members not in the current attempt), 72 (deduplicate the DKGStarted callback only after its handler completes)
  • Informational: 11 (count the third operator's seats in the triplet-eligibility filter), 70 (reject non-canonical high-S signer approval certificates), 92 (fail production signer startup when approval roots are missing)

Finding 1's complete remediation is a bridge-side change in the tBTC Bridge (Fraud.defeatFraudChallenge) that lives in the tbtc-v2 repository; the keep-core side fails closed until an operator confirms that defense is deployed.

Verification

go build ./... passes. The affected-package tests are green (pkg/tecdsa/retry, pkg/net/retransmission, pkg/covenantsigner, pkg/maintainer/spv, and the covenant-signer, signing-done, and wallet-lifecycle regression tests in pkg/tbtc). Each finding commit carries its own regression test(s) — race-detector tests for the concurrency findings, handler-level transient-failure and redelivery tests for the deduplication findings, and value-level tests for the accounting and filtering fixes.

Opened as a draft for review.

…riplet eligibility filter

excludeOperatorTriplets assigned the third operator from operators[j] instead of operators[k], so the post-exclusion seat-count filter double-counted the middle operator and ignored the third. This admitted triples whose true remaining seat count is below retryParticipantsCount and dropped otherwise-valid triples, biasing the tECDSA DKG/signing retry candidate set toward under-quorum attempts that then fail at the protocol layer.

Assign rightOperator from operators[k] so the filter subtracts the seats of all three distinct operators in each triple. Add a regression test that pins the eligible-triplet count and the selected exclusion against the corrected arithmetic.
… past redemption events

PastRedemptionRequestedEvents populated the converted event's TxMaxFee field from event.TreasuryFee, so every historical RedemptionRequested event returned to off-chain consumers (monitoring, replay, analytics) reported the treasury fee in place of the per-transaction max fee. The live-state converter in the same file already reads TxMaxFee correctly, confirming this as an off-by-one-field copy-paste error.

Map TxMaxFee from event.TxMaxFee. Extract the per-event conversion into a pure convertPastRedemptionRequestedEvent helper so the field-by-field mapping is unit testable, and add a regression test asserting the treasury fee and tx max fee are populated from their own distinct sources.

TOB-TBTCACEXT-10
… signing done check

The listen goroutine wrote to signingDoneCheck.doneSigners under doneSignersMutex, but waitUntilAllDone read the map (length check and range iteration) and isValidDoneMessage read it (duplicate-signer check) without holding the mutex. Go maps are not safe for concurrent read/write, so normal signing completion — including the covenant signer submit paths — raced and could fail or crash the signer process.

Acquire doneSignersMutex around the length check and iteration in waitUntilAllDone (capturing the result into locals and releasing the lock before returning) and around the duplicate-signer read in isValidDoneMessage. Add a race-detector regression test that drives concurrent listen writes and waitUntilAllDone reads; it broadcasts done messages with the stateless standard retransmission strategy to isolate this race from the separate backoff-strategy synchronization.
…ck counters

BackoffStrategy stored its tickCounter, delay, and retransmitTick counters without synchronization, but ScheduleRetransmissions invokes strategy.Tick from a fresh goroutine on every ticker event. When one retransmission tick was still running as the next fired, two goroutines read and wrote the same BackoffStrategy instance concurrently, corrupting the retransmission schedule used by tECDSA signing, DKG, and signing-done messages.

Add a sync.Mutex to BackoffStrategy and guard the counter read/modify/write in Tick, capturing the retransmit decision into a local and invoking retransmitFn after releasing the lock so the message is not retransmitted while holding shared state. Add a race-detector regression test that calls Tick concurrently from many goroutines behind a start barrier.
…ttempt members

signingDoneCheck.listen discarded the attempt member list and kept only its length, and isValidDoneMessage validated wallet-group membership but not attempt inclusion. Any valid wallet group member — including one excluded from the current attempt — could therefore fill a completion slot, letting waitUntilAllDone reach its threshold before the selected signers finished and skewing the next batch item's start block from a non-attempt member's endBlock.

Record the selected attempt members in a set in listen and reject done messages whose sender is not in that set in isValidDoneMessage. Keep the set as goroutine-local state captured by the listener closure and passed to isValidDoneMessage, rather than on the shared signingDoneCheck struct, so concurrent listen calls on the same instance cannot race on it or trigger a "concurrent map writes" fatal error while populating it. Add a regression test that sends a done message from a valid group member excluded from the attempt and asserts the completion check does not count it.
…oval certificates

verifySignerApprovalCertificate parsed the DER-encoded threshold signature and called ecdsa.Verify without checking that the signature used the canonical low-S form. ECDSA signatures are malleable: for a valid secp256k1 signature (r, s), the signature (r, N - s) also verifies, so two distinct byte encodings were accepted for the same approval — a hazard if certificate signature bytes are ever used as an identifier, deduplication key, or audit-log key.

Add a low-S check after DER parsing and before ecdsa.Verify, mirroring the rule already applied by the covenant signer artifact approval verifier. Add a regression test that malleates a valid certificate's signature to its high-S counterpart (r, N - s) and asserts the verifier rejects it.
… destination

buildCovenantTransactionBuilder wrote MigrationDestination.DepositScript directly as the destination output's PublicKeyScript. In tBTC that field is the plain deposit script, and the Bridge's Deposit library rebuilds the funding output and requires its 32-byte hash to equal sha256(depositScript) for P2WSH. Signing the raw deposit script as the output script produced a funding output that revealDepositWithExtraData rejects, making the migration deposit unrevealable and leaving the migration debt unrepaid.

Derive the P2WSH scriptPubKey (OP_0 <sha256(depositScript)>) from the deposit script via bitcoin.PayToWitnessScriptHash(bitcoin.WitnessScriptHash(...)), matching the pattern already used for the active covenant UTXO, and reject empty deposit scripts. Update the self_v1 and qc_v1 submit tests to assert the destination output is the P2WSH of the deposit script and not the raw script.
…jobs persist across restarts

The covenant signer store persisted jobs under the nested directory name "covenant-signer/jobs". The disk persistence handle creates only one directory level (os.Mkdir, not MkdirAll) and enumerates only one directory level on load, reporting a nested file's directory as its first-level parent. As a result the first job save could fail when the parent directory did not exist, and even when it did, every persisted job was skipped on reload — breaking idempotency and recovery for accepted requests across restarts.

Use the single-level directory name "covenant-signer-jobs", skip non-.json files (such as the store lock file that now shares the jobs directory) during load, and migrate any job files left under the legacy nested path on startup. Add disk-backed store tests that persist a job through a real NewBasicDiskHandle, restart the store, and assert both the request ID and route request indexes are restored, plus a test covering the legacy-path migration.
…ts in covenant signing

When a tBTC wallet is closed or terminated, handleWalletClosure archives it and
removes it from the wallet registry, but two paths kept the archived wallet
signable through the covenant signer:

  1. getSigningExecutor consults the node's signingExecutors cache before the
     wallet registry, so an executor created while the wallet was live kept being
     returned after archival. The node could keep participating in threshold
     signing (including the covenant signer paths) for a wallet the closure was
     meant to deauthorize, until the process restarted.

  2. VerifySignerApproval resolved the wallet and checked that registry data was
     available, but never checked the wallet lifecycle state. The signer set hash
     embedded in an approval certificate binds only the wallet identity, members
     hash, and threshold, none of which change when a wallet is closed or
     terminated, so a certificate issued while the wallet was live kept verifying
     after closure.

Add node.invalidateSigningExecutor, which evicts the wallet's cached executor
under signingExecutorsMutex, and call it from handleWalletClosure after
walletRegistry.archiveWallet succeeds. Fail closed in VerifySignerApproval by
rejecting covenant signing for wallets that are not in the live state, the only
state covenant migrations are expected for; supporting another state later
requires adding it explicitly with justification.

Add regression tests that cache an executor, archive the wallet, confirm the
stale executor is still served pre-invalidation, and assert the wallet is no
longer signable after invalidation; and that VerifySignerApproval rejects an
otherwise-valid request once the wallet is no longer live.
…p without approval roots

The covenant signer service only enforced the presence of a signer approval verifier and the route trust roots when the requireApprovalTrustRoots flag was set; otherwise it merely logged warnings and started anyway. A production (non-loopback) deployment could therefore start with the multi-party approval model silently disabled, relying on operators noticing warnings and enabling strict settings.

Treat a non-loopback listen address as requiring the full approval model, mirroring the existing non-loopback authToken requirement: validateRequiredApprovalTrustRoots now also enforces the trust roots and the signer approval verifier when the listen address is non-loopback. Loopback deployments keep the warning-only behavior unless requireApprovalTrustRoots is set. Add startup tests asserting that a non-loopback address fails when trust roots or the verifier are missing.
… DKG approval

archiveClosedWallets used IsWalletRegistered(walletID) == false as the signal that a wallet was closed or terminated and could be archived. That signal also matches a freshly generated wallet whose DKG result has been submitted but not yet approved on-chain: during the approval window the wallet exists only on the node's disk and is not yet registered. A node restart in that window archived the fresh key share, and enough affected operators could leave the new wallet without a signing quorum.

Distinguish the two cases via GetWallet, which returns ErrWalletNotFound for a wallet the Bridge has never recorded (the pending case) and returns the wallet with its state for a registered wallet (which the Bridge retains permanently after closure). Archive only wallets recorded on-chain in a Closed or Terminated state, and leave not-yet-registered wallets in place. Add regression tests covering both the pending-wallet skip and the closed-wallet archival.
…evoked withdrawals

The internal withdrawRevoked overwrote deposits[operator].withdrawn with the current available amount instead of adding to it. After a prior partial withdrawal had already incremented withdrawn, calling withdrawRevoked reset the counter, making availableAmount report the same tokens as available again. Because the escrow holds KEEP for many operators in one shared balance, repeated withdrawRevoked calls could withdraw the same tokens over and over and drain tokens backing other operators' deposits.

Set deposits[operator].withdrawn = deposit.withdrawn.add(amount), matching the accounting already used by withdraw and migrate. Add a regression test that performs a partial withdraw, revokes the grant, calls withdrawRevoked, and asserts a second withdrawRevoked withdraws nothing.
…ling completes

The DKGResultSubmitted callback recorded the deduplication key before running
validation. Because validation can exit early on transient failures (RPC errors
in IsDKGResultValid/ChallengeDKGResult/GetDKGState/DKGParameters, or a failed
challenge-confirmation wait), a failed attempt left the key set. A later
redelivery of the same event was then dropped as already-processed, silently
removing the node from the challenger set for the rest of the challenge window.

Split the deduplicator into claim/confirm/abort semantics backed by an
in-progress set plus the existing completed TimeCache: an event is only recorded
as completed once its handler succeeds, while concurrent duplicate deliveries
are still ignored. executeDkgValidation/validateDKG now return an error so the
callback confirms the dedup entry on a terminal outcome and releases it on a
transient failure, allowing a redelivered event to retry.
…archival

The WalletClosed callback recorded the deduplication key before archiving the
wallet. handleWalletClosure can fail transiently (RPC error while reading the
block counter or waiting for closure confirmation, an unconfirmed result, or a
failed archive write) and return before removing the wallet from the local
registry. The dedup key stayed set, so a redelivered WalletClosed event was
dropped and the closed wallet remained in walletRegistry, keeping it signable
through fresh getSigningExecutor lookups until process restart.

Move the wallet closure to claim/confirm/abort deduplication: the callback now
confirms the dedup entry only after handleWalletClosure succeeds and releases it
on failure, so a redelivered event retries the archival.
…er handling

The DKGStarted callback recorded the deduplication key before the block-height
confirmation wait, the DKG state check, the past-event lookup, and the local
DKG join. If any of those steps returned early on a transient error, the dedup
key stayed set and a redelivered event was skipped, so the operator never joined
the new group.

Move the DKG started handling to claim/confirm/abort deduplication: the callback
now confirms the dedup entry only once the local DKG join has been dispatched
(or the event was authoritatively found unconfirmed) and releases it on any
transient early return, so a redelivered event retries the handling.
…f targets

Each SPV scanner fetched the latest TransactionLimit transactions for a wallet
address and only then filtered for unproven protocol transactions. Because the
limit was applied to the raw address history before protocol-level filtering,
an attacker could send more than TransactionLimit dust transactions to the
wallet address to push the real protocol transaction out of the bounded window,
so the auto-prover found nothing and never submitted the SPV proof.

Add a shared getUnprovenWalletTransactions helper that lists the wallet's
confirmed transaction hashes and scans them newest-to-oldest, fetching and
protocol-filtering each until enough unproven protocol transactions are found
or the history is exhausted. The limit now bounds the number of matching
protocol transactions rather than the raw address transactions examined, so
unrelated spam can no longer evict protocol transactions from discovery. All
four scanners (deposit sweep, redemption, moving funds, moved funds sweep) use
the helper.
…bridge fraud defense

The covenant signer produces a normal Bitcoin SIGHASH_ALL signature over a
covenant active UTXO using the tBTC wallet key. Because a covenant active UTXO
is neither a swept deposit, a spent main UTXO, nor a processed moved-funds
sweep, the tBTC Bridge cannot defeat a fraud challenge that replays this
signature, so a valid covenant migration signature can become a slashable
wallet signature once the transaction is broadcast.

The complete remediation is a bridge-side covenant fraud-defense path in the
tBTC Bridge (Fraud.defeatFraudChallenge), which lives outside this repository.
As the keep-core-side safeguard, gate covenant signing behind an explicit
operator confirmation that the bridge covenant fraud-defense is deployed:
covenantSigner.bridgeCovenantFraudDefenseConfirmed defaults to false and, until
set, OnSubmit fails closed for both self_v1 and qc_v1 routes so the node cannot
produce a signature that would expose a wallet to an undefeatable fraud
challenge. The signing choke point is documented with the same invariant.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 18e033c1-4fb9-48d4-952e-9353e48117ed

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/tob-tbtcacext-remediation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@lrsaturnino lrsaturnino changed the title fix: remediate Trail of Bits tBTC v2 AC Extension audit findings Trail of Bits tBTC v2 AC Extension — security remediation Jul 1, 2026
@lrsaturnino lrsaturnino changed the title Trail of Bits tBTC v2 AC Extension — security remediation Trail of Bits tBTC v2 AC Extension — security remediation (17 findings) Jul 1, 2026
@lrsaturnino lrsaturnino changed the title Trail of Bits tBTC v2 AC Extension — security remediation (17 findings) Trail of Bits tBTC v2 AC Extension — security remediation Jul 1, 2026
…diation

Reconcile the Trail of Bits AC Extension remediation with the base feature
branch, which has since landed its own independent deep-audit work. Three files
conflicted; both resolutions preserve the security intent:

- signer approval low-S (TOB-TBTCACEXT-70): the base already rejects non-canonical
  high-S signatures inline in verifySignerApprovalCertificate, so the duplicate
  isLowSSecp256k1 helper added here was dropped in favor of the base's version.

- covenant-signer job directory (TOB-TBTCACEXT-65): the base reworked store.go but
  still persisted jobs under the nested "covenant-signer/jobs" path, which the
  single-level disk handle skips on reload. Re-ported the flat "covenant-signer-jobs"
  directory, the legacy-path startup migration, and the non-.json load skip onto the
  base's version.

Build passes and the covenant-signer, signing-done, wallet-lifecycle, and
retransmission/retry/SPV package tests are green on the merged tree.
@lrsaturnino lrsaturnino marked this pull request as ready for review July 1, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant