worker: remove fd from epoll on the loop thread before recycling Conn#216
Open
kwsantiago wants to merge 1 commit into
Open
worker: remove fd from epoll on the loop thread before recycling Conn#216kwsantiago wants to merge 1 commit into
kwsantiago wants to merge 1 commit into
Conversation
This was referenced Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In non-blocking mode, an HTTP connection's fd is armed level-triggered (
monitorRead:EPOLL.IN | RDHUP, no ONESHOT) with the rawConnpointer as epoll userdata. When a request finishes with.close/.unknown,processHTTPDatacloses the socket on a worker thread to (implicitly) drop it from epoll, then signals the loop;processSignallater recycles theConnvia theMemoryPool(disown), with no explicitEPOLL_CTL_DEL.Closing the fd on the worker thread races
epoll_waiton the loop thread. Under connection churn the fd can remain armed past the point wheredisownreturns theConnto the pool, so a later batch delivers a.recvcarrying a pointer to freed memory.protocolis the first field ofConnand theMemoryPoolfree-list clobbers it, so the recycled node reads as{ .http = null }andhttp_conn.getState()dereferences null. We hit this in production as a recurring SIGSEGV ingetState(@atomicLoad(&self._state)) at the_stateoffset, during normal operation, every few hours.The deferred-signal change only orders signal-vs-recv within a single batch; it does not cover a
.recvdelivered in a later batch for an already-recycledConn.HTTPConn.disownalready does a synchronousEPOLL_CTL_DELfor theres.disown()path, for the same reason; the.close/.unknownpath just relied on the worker-threadclose()instead.Fix
Move teardown of
.close/.unknownconnections onto the loop thread: drop the worker-threadposix.close, and inprocessSignalremove the fd from the event loop (EPOLL_CTL_DEL/EV.DELETE) and close it beforedisownrecycles theConn. Since the fd is still open at that point, the DEL is valid, and no later batch can reference the freedConn. Aremovehelper is added to both the epoll and kqueue loops, mirroringHTTPConn.disown.The signal-vs-recv-in-one-batch case the worker-thread close used to guard is already handled by deferring signal processing to the end of the batch plus the
getState().handovercheck in the recv path.Builds clean on 0.16; full test suite (125 tests) passes.