Skip to content

Handle Builtin Atomics Async Critical Section in the Main Thread#976

Merged
aapoalas merged 5 commits intotrynova:mainfrom
komyg:fix/built-atomics-async-implementation
Apr 19, 2026
Merged

Handle Builtin Atomics Async Critical Section in the Main Thread#976
aapoalas merged 5 commits intotrynova:mainfrom
komyg:fix/built-atomics-async-implementation

Conversation

@komyg
Copy link
Copy Markdown
Contributor

@komyg komyg commented Mar 26, 2026

Fixes this comment: #956 (review)

@komyg
Copy link
Copy Markdown
Contributor Author

komyg commented Apr 7, 2026

Hey @aapoalas, I fixed a timeout issue that we were having on Test262, and all tests are passing now.

Could you review this when you have the time?

Copy link
Copy Markdown
Member

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't unfortunately really do what is advertised or requested by the specification.

The point is that we should be doing the following steps for the async case especially:

  1. Acquire WaiterList lock.
  2. Read value from buffer.
  3. Compare value to expected and return NotEqual if it is not equal.
  4. Produce a new waiter and add it into the WaiterList.
  5. Return a Promise tied to the new waiter.

Or something like that. Instead we do:

  1. Read value from buffer, compare and return NotEqual if not equal.
  2. Start a new thread and wait until the new thread has started up.
  3. On the new thread, acquire the WaiterList lock and tell the main thread we've started up.
  4. Still on the new thread, produce a new waiter and add it into the WaiterList.
  5. On the main thread return a Promise tied to the new thread.

The issue is that between steps 1 and 3 it's possible for some foreign thread to change the value to not be our expected value, acquite the WaiterList lock, and wake up all waiters. Our waitAsync call will not be woken up by this because the foreign thread got to the WaiterList first, we are not in it yet, but we also managed to read the old expected value from the buffer. This ends up with our waitAsync Promise never resolving.

The fundamental mistake that I made in the original implementation was using a native thread for the waiter: that makes it really hard (or slow) to have the read happen within the critical section. At the end of the day the correct choice should be to leave the threading and timeouting to the embedder.

Comment thread nova_vm/src/ecmascript/builtins/structured_data/atomics_object.rs Outdated
@komyg komyg force-pushed the fix/built-atomics-async-implementation branch from c776759 to 1736244 Compare April 15, 2026 23:18
@komyg
Copy link
Copy Markdown
Contributor Author

komyg commented Apr 17, 2026

Hey @aapoalas, thank you for the feedback and for the explanation.

I've updated the PR to try to handle the critical sections properly. Please let me know what you think.

Next, I am planning to finish the implementation so that we don't depend on threads, as you said.

@aapoalas
Copy link
Copy Markdown
Member

Hey @aapoalas, thank you for the feedback and for the explanation.

I've updated the PR to try to handle the critical sections properly. Please let me know what you think.

Next, I am planning to finish the implementation so that we don't depend on threads, as you said.

Unfortunately the critical section handling still isn't fully correct. The reading of the data is now within the critical section (I think - I should bring the code in locally and really think it through to be sure), but there is still a race condition. What's happening now is:

  1. Acquire the WaiterList lock on the main thread.
  2. Read and compare value to expected, return NotEqual if we don't find the expected value. This is correct and good.
  3. Create a new waiter and add it into the WaiterList.
  4. Start a new thread, release the WaiterList lock, queue up a Job, and return a Promise..
  5. On the new thread, acquire the WaiterList lock and go to sleep awaiting for a wake up signal.

The issue is that the steps 4 and 5 must happen in this order as the new thread cannot acquire the lock before the main thread has released it, and/but there is no guarantee that the new thread is actually going to be the thread that acquires the lock immediately after the new thread. Hence, the critical section actually ends after step 4, before the new thread is waiting for wakeup signals.

It is therefore possible that we reach the end of step 4, have determined that the value is the expected value, and that we should sleep, but then a foreign thread acquires the WaiterList lock before our waiter thread does, and changes the value and signals a wakeup. Then our waiter thread acquires the lock, goes to sleep and releases the lock again: but now it's entirely possible that no weakup call ever comes again; we missed the signal that was supposed to wake us up and we're now effectively at a deadlock.

I think it's fine to merge this work, but it's definitely still not what the spec requires. The extra thread keeps harassing us :)

@komyg
Copy link
Copy Markdown
Contributor Author

komyg commented Apr 19, 2026

Hey @aapoalas, thank you for the feedback and for the explanation.

I've updated the PR to try to handle the critical sections properly. Please let me know what you think.

Next, I am planning to finish the implementation so that we don't depend on threads, as you said.

Unfortunately the critical section handling still isn't fully correct. The reading of the data is now within the critical section (I think - I should bring the code in locally and really think it through to be sure), but there is still a race condition. What's happening now is:

  1. Acquire the WaiterList lock on the main thread.
  2. Read and compare value to expected, return NotEqual if we don't find the expected value. This is correct and good.
  3. Create a new waiter and add it into the WaiterList.
  4. Start a new thread, release the WaiterList lock, queue up a Job, and return a Promise..
  5. On the new thread, acquire the WaiterList lock and go to sleep awaiting for a wake up signal.

The issue is that the steps 4 and 5 must happen in this order as the new thread cannot acquire the lock before the main thread has released it, and/but there is no guarantee that the new thread is actually going to be the thread that acquires the lock immediately after the new thread. Hence, the critical section actually ends after step 4, before the new thread is waiting for wakeup signals.

It is therefore possible that we reach the end of step 4, have determined that the value is the expected value, and that we should sleep, but then a foreign thread acquires the WaiterList lock before our waiter thread does, and changes the value and signals a wakeup. Then our waiter thread acquires the lock, goes to sleep and releases the lock again: but now it's entirely possible that no weakup call ever comes again; we missed the signal that was supposed to wake us up and we're now effectively at a deadlock.

I think it's fine to merge this work, but it's definitely still not what the spec requires. The extra thread keeps harassing us :)

I see, yeah sorry I didn't think of this case.

If it is alright with you, I think we can merge this and I will make a follow up PR that fixes this and the thread issue. What do you think?

@aapoalas
Copy link
Copy Markdown
Member

Sure thing!

@aapoalas aapoalas merged commit 285a37e into trynova:main Apr 19, 2026
8 checks passed
@komyg komyg deleted the fix/built-atomics-async-implementation branch April 20, 2026 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants