fix: add thread-safety lock to shared ClassVar state#71
Conversation
4a11c34 to
34d9f3b
Compare
|
Note: We also had to disable uvicorn's default WebSocket ping timeout ( This change is in core LNbits ( |
|
@motorina0 Are you happy with the architectural changes? This PR comment notes that a change is required in core too - #71 (comment) I've not tested the changes yet, waiting on your OK/NOK before testing. |
|
Have received no NWC client errors in last 12hrs running this. |
|
Still running clean after 36hrs and heavy zap session last night 😂 |
|
Nice pr, changes are pretty minimal. Could you pr core as well and reference here? |
|
linting failed |
NostrRouter uses ClassVars (received_subscription_events, received_subscription_eosenotices, received_subscription_notices) that are written by a daemon thread (via subscribe_events callbacks) and read/deleted by asyncio tasks (via _handle_subscriptions). Without synchronization this causes lost events, dropped EOSE messages, and data corruption. Add a threading.Lock and wrap all shared state access with it. Only the actual data mutation is locked, not the await send_text calls.
34d9f3b to
07ffa3c
Compare
|
Linting fixed and WS Ping interval PR opened in Core. #4028 However, PR #70 ( PR #71's
If PR #70 is merged, PR #71 should be closed as superseded. Core PR still legit. |
Problem
NostrRouteruses threeClassVarcollections to shuttle events from relays to WebSocket clients:received_subscription_eventsreceived_subscription_eosenoticesreceived_subscription_noticesThese are written by a daemon thread (the
subscribe_eventsbackground task via callbacks intasks.py) and read/deleted by asyncio tasks (the_nostr_to_clientloop inrouter.py). No synchronization protects these shared data structures.This race condition causes non-deterministic loss of events and EOSE messages. Downstream consumers like the NWC Service Provider wait indefinitely for EOSE that never arrives, causing "all promises rejected" errors in client wallets.
Fix
threading.Lockas aClassVaronNostrRouterwith NostrRouter.lock:await send_textcalls), minimizing lock contentionThe lock is held only for the duration of the list/dict mutation (pop, del, read), not across the
awaitthat sends the event to the WebSocket client. This keeps critical sections minimal while ensuring thread safety.Testing