-
Notifications
You must be signed in to change notification settings - Fork 427
Defer MonitorUpdatingPersister writes to flush() #4317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! I see this is a draft PR. |
79e9390 to
c8405e2
Compare
| use core::mem; | ||
| use core::ops::Deref; | ||
| use core::pin::{pin, Pin}; | ||
| use core::pin::pin; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to do this without touching persist.rs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would we do it then? Queue in ChainMonitor, in KVStore, or somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ChainMonitor would need to store the actual monitor data to defer writes, not just track update IDs as it does now. This means either cloning expensive ChannelMonitor objects or storing serialized bytes, which leaks persistence format details into ChainMonitor?
c8405e2 to
40e909a
Compare
2b85294 to
93ff6c9
Compare
Update MonitorUpdatingPersister and MonitorUpdatingPersisterAsync to queue persist operations in memory instead of writing immediately to disk. The Persist trait methods now return ChannelMonitorUpdateStatus:: InProgress and the actual writes happen when flush() is called. This fixes a race condition that could cause channel force closures: previously, if the node crashed after writing channel monitors but before writing the channel manager, the monitors would be ahead of the manager on restart. By deferring monitor writes until after the channel manager is persisted (via flush()), we ensure the manager is always at least as up-to-date as the monitors. The flush() method takes an optional count parameter to flush only a specific number of queued writes. The background processor captures the queue size before persisting the channel manager, then flushes exactly that many writes afterward. This prevents flushing monitor updates that arrived after the manager state was captured. Key changes: - Add PendingWrite enum to represent queued write/remove operations - Add pending_writes queue to MonitorUpdatingPersisterAsyncInner - Add pending_write_count() and flush(count) to Persist trait and ChainMonitor - ChainMonitor::flush() calls channel_monitor_updated for each completed write - Update Persist impl to queue writes and return InProgress - Call flush() in background processor after channel manager persistence - Remove unused event_notifier from AsyncPersister Co-Authored-By: Claude Opus 4.5 <[email protected]>
93ff6c9 to
181a6e0
Compare
Update MonitorUpdatingPersister and MonitorUpdatingPersisterAsync to queue persist operations in memory instead of writing immediately to disk. The Persist trait methods now return ChannelMonitorUpdateStatus:: InProgress and the actual writes happen when flush() is called.
This fixes a race condition that could cause channel force closures: previously, if the node crashed after writing channel monitors but before writing the channel manager, the monitors would be ahead of the manager on restart. By deferring monitor writes until after the channel manager is persisted (via flush()), we ensure the manager is always at least as up-to-date as the monitors.
Key changes: