apollo_l1_events: reuse running sync task on catch-up restart by raising its shared target height#14518
Conversation
PR SummaryMedium Risk Overview Catchupper now stores
A regression test asserts the same task handle and raised target when catch-up restarts with a higher block. Reviewed by Cursor Bugbot for commit f8fa0b1. Bugbot is set up for automated code reviews on this repo. Configure here. |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
noamsp-starkware
left a comment
There was a problem hiding this comment.
Please update the commit message and description to reflect the actual changes introduced. something like 'reuse running sync task on catch-up restart by raising its shared target height'.
@noamsp-starkware reviewed 4 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on ShahakShama).
crates/apollo_l1_events/src/catchupper.rs line 109 at r1 (raw file):
pub fn target_height(&self) -> BlockNumber { BlockNumber(self.target_height.load(Ordering::SeqCst))
Pair this read with Acquire to match the Release write below. SeqCst adds a global total-order guarantee across all SeqCst ops in the program; there's no other atomic here that needs to be sequenced against this one, so it's wasted.
crates/apollo_l1_events/src/catchupper.rs line 115 at r1 (raw file):
/// Uses `fetch_max` so the target only moves forward; a lower height is ignored. pub fn update_target_height(&self, target_height: BlockNumber) { self.target_height.fetch_max(target_height.0, Ordering::SeqCst);
SeqCst is stronger than needed here. There's a single writer (the provider) and single reader (the sync task), so Release on the write side is sufficient — it guarantees all prior writes are visible before the new target value is published.
Code snippet:
self.target_height.fetch_max(target_height.0, Ordering::Release);crates/apollo_l1_events/src/catchupper.rs line 173 at r1 (raw file):
// (a higher block committed before catch-up finishes) extends this same task instead of // spawning a competing one. while current_height.0 <= target_height.load(Ordering::SeqCst) {
Acquire
crates/apollo_l1_events/src/catchupper.rs line 178 at r1 (raw file):
"Syncing L1EventsProvider with L2 height: {} to target height: {}", current_height, target_height.load(Ordering::SeqCst)
Acquire
ShahakShama
left a comment
There was a problem hiding this comment.
@ShahakShama made 4 comments.
Reviewable status: 3 of 4 files reviewed, 4 unresolved discussions (waiting on noamsp-starkware).
crates/apollo_l1_events/src/catchupper.rs line 109 at r1 (raw file):
Previously, noamsp-starkware wrote…
Pair this read with Acquire to match the Release write below. SeqCst adds a global total-order guarantee across all SeqCst ops in the program; there's no other atomic here that needs to be sequenced against this one, so it's wasted.
Done.
crates/apollo_l1_events/src/catchupper.rs line 115 at r1 (raw file):
Previously, noamsp-starkware wrote…
SeqCst is stronger than needed here. There's a single writer (the provider) and single reader (the sync task), so Release on the write side is sufficient — it guarantees all prior writes are visible before the new target value is published.
Done.
crates/apollo_l1_events/src/catchupper.rs line 173 at r1 (raw file):
Previously, noamsp-starkware wrote…
Acquire
Done.
crates/apollo_l1_events/src/catchupper.rs line 178 at r1 (raw file):
Previously, noamsp-starkware wrote…
Acquire
Done.
ShahakShama
left a comment
There was a problem hiding this comment.
@ShahakShama resolved 4 discussions and dismissed @noamsp-starkware from 4 discussions.
Reviewable status: 3 of 4 files reviewed, all discussions resolved (waiting on noamsp-starkware).
asaf-sw
left a comment
There was a problem hiding this comment.
Solid fix for the task leak — sharing an Arc<AtomicU64> target and reusing the in-flight task is the right call over abort+respawn, and the regression test pins the behavior well (ptr_eq on the handle, forever-failing sync). Confirmed the skipped reset_catchupper in the running branch is benign: the backlog is always drained via mem::take and the state set to Pending at catch-up completion, so it's empty whenever this branch is reached.
Two notes inline: one TOCTOU race worth flagging, one atomic-ordering nit.
asaf-sw
left a comment
There was a problem hiding this comment.
@asaf-sw reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ShahakShama).
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit dff312e. Configure here.
ShahakShama
left a comment
There was a problem hiding this comment.
@ShahakShama resolved 1 discussion.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ShahakShama).
asaf-sw
left a comment
There was a problem hiding this comment.
@asaf-sw made 2 comments and resolved 2 discussions.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on ShahakShama).
asaf-sw
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! all files reviewed, all discussions resolved (waiting on ShahakShama).
asaf-sw
left a comment
There was a problem hiding this comment.
@asaf-sw made 1 comment.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on ShahakShama).
…ing its shared target height Goal: Fix audit finding H-14 — unbounded task leak in the L1EventsProvider catch-up flow. Change summary: - On catch-up restart, reuse the in-flight l2_sync_task instead of recreating the Catchupper and spawning a second one. target_height is now an Arc<AtomicU64> shared with the running task; start_catching_up raises it via update_target_height when a task is already running, and the task re-reads it each iteration so it keeps syncing up to the new target. - Dropping a tokio JoinHandle does not cancel the task, so before this every catch-up restart leaked a still-running sync task that races to commit the same blocks and can spin forever in its retry loop (DoS). - Add regression test restarting_catch_up_raises_target_without_spawning_second_sync_task. Decision points: - Keep the running task and raise its shared target rather than abort + respawn: aborting a task parked on a client call is not cancel-safe (the commit may apply while its result is never observed). A single long-lived task that tracks the latest target avoids both the leak and the cancel-safety hazard. - Atomic orderings: Release on the writer (update_target_height), Acquire on the readers, since there is a single writer (provider) and single reader (sync task). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, relax atomic ordering Goal: address asaf-sw's review notes on PR #14518 in a follow-up branch (the base commit is left untouched per request). Changes: - start_catching_up: after raising the target on a running sync task, re-check is_sync_task_running(); if the task finished in the TOCTOU window, fall through to respawn instead of leaving the provider stuck in CatchingUp with no sync task. - target_height atomic: Acquire/Release -> Relaxed on both load and fetch_max sides, since the atomic carries only its own value and publishes no other memory. Decision points: - TOCTOU: chose respawn-on-finish (double-check) over comment-only. sync_task_health_check remains the backstop for the residual narrow window where a task finishes even after the second check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, relax atomic ordering Goal: address asaf-sw's review notes on PR #14518 in a follow-up branch (the base commit is left untouched per request). Changes: - start_catching_up: after raising the target on a running sync task, re-check is_sync_task_running(); if the task finished in the TOCTOU window, fall through to respawn instead of leaving the provider stuck in CatchingUp with no sync task. - target_height atomic: Acquire/Release -> Relaxed on both load and fetch_max sides, since the atomic carries only its own value and publishes no other memory. Decision points: - TOCTOU: chose respawn-on-finish (double-check) over comment-only. sync_task_health_check remains the backstop for the residual narrow window where a task finishes even after the second check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>


Goal: Fix audit finding H-14 — unbounded task leak in the L1EventsProvider
catch-up flow.
Change summary:
Catchupper and spawning a second one. target_height is now an Arc
shared with the running task; start_catching_up raises it via
update_target_height when a task is already running, and the task re-reads it
each iteration so it keeps syncing up to the new target.
catch-up restart leaked a still-running sync task that races to commit the same
blocks and can spin forever in its retry loop (DoS).
Decision points:
aborting a task parked on a client call is not cancel-safe (the commit may apply
while its result is never observed). A single long-lived task that tracks the
latest target avoids both the leak and the cancel-safety hazard.
readers, since there is a single writer (provider) and single reader (sync task).
Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com