Summary
Production SIGSEGV on 2026-06-27 09:02:54 (node on v0.5.5, normal operation, not shutdown):
Segmentation fault at address 0x2f8
httpz worker.zig:1693:40 in getState -> @atomicLoad(State, &self._state, .acquire)
0x2f8 is the offset of HTTPConn._state, so self (the http_conn pointer) is null/freed when the epoll loop dispatches a .recv event for it.
Caller (worker.zig ~591):
.recv => |conn| switch (conn.protocol) {
.http => |http_conn| switch (http_conn.getState()) { ... }
}
conn.protocol is tagged .http but http_conn is null/garbage — a Conn was freed/reused while epoll still delivered a .recv event for it.
Root cause
This is in vendored karlseguin/http.zig (pin 8dc6441, == upstream master == our v0.5.7). The worker.zig getState body and the .recv caller block are byte-identical between the v0.5.5 pin (5d1b4e2) and v0.5.7, so v0.5.5 -> v0.5.7 does not fix this crash.
- epoll stores raw
@intFromPtr(conn) as userdata (monitorRead, worker.zig:1298/1303).
Conn is recycled via std.heap.MemoryPool conn_mem_pool (decl :423); worker-level disown() does conn_mem_pool.destroy(conn) at :911.
MemoryPool free-list stores a next-ptr in the first bytes of a freed node, which alias Conn.protocol (first field, :1513). For the last-freed node that ptr is null => protocol reads as {.http = null}.
.recv handler (:595-599) then does conn.protocol.http -> http_conn.getState() (:1697) -> @atomicLoad(&null._state) faults at offset 0x2f8. Exact match.
Ordering bug: worker-level disown (:900-912) frees the Conn (http_conn_pool.release + conn_mem_pool.destroy) with no EPOLL_CTL_DEL of its own; it relies on a prior close() side-effect. HTTPConn.disown (:1721-1746) does epoll_ctl DEL, showing maintainers know epoll needs synchronous removal, but the worker disown path omits it. processHTTPData .close/.unknown (:838) closes the socket on a worker thread then signals; the actual free happens later in the event loop (processSignal :782). close()/epoll vs concurrent epoll_wait has an inherent window under max_connections churn. The deferred-signal fix (:576-657) only orders signal-vs-recv within one batch and does not cover cross-batch / worker-close races.
Fix direction
EPOLL_CTL_DEL before close, synchronously on the owning still-valid fd, centralized in Conn.close() (:1540) + the processHTTPData .close branch; or
- robust fix: generation guard / stop recycling
Conn through a clobbering MemoryPool.
Lands upstream in karlseguin/http.zig; carry via upstream PR or a temporary fork pin in build.zig.zon (archive url + hash).
Repro
Likely under connection churn at/near max_connections (1000).
Not a duplicate of
Summary
Production SIGSEGV on 2026-06-27 09:02:54 (node on v0.5.5, normal operation, not shutdown):
0x2f8is the offset ofHTTPConn._state, soself(thehttp_connpointer) is null/freed when the epoll loop dispatches a.recvevent for it.Caller (worker.zig ~591):
conn.protocolis tagged.httpbuthttp_connis null/garbage — aConnwas freed/reused while epoll still delivered a.recvevent for it.Root cause
This is in vendored
karlseguin/http.zig(pin8dc6441, == upstream master == our v0.5.7). The worker.ziggetStatebody and the.recvcaller block are byte-identical between the v0.5.5 pin (5d1b4e2) and v0.5.7, so v0.5.5 -> v0.5.7 does not fix this crash.@intFromPtr(conn)as userdata (monitorRead, worker.zig:1298/1303).Connis recycled viastd.heap.MemoryPool conn_mem_pool(decl :423); worker-leveldisown()doesconn_mem_pool.destroy(conn)at :911.MemoryPoolfree-list stores a next-ptr in the first bytes of a freed node, which aliasConn.protocol(first field, :1513). For the last-freed node that ptr is null =>protocolreads as{.http = null}..recvhandler (:595-599) then doesconn.protocol.http->http_conn.getState()(:1697) ->@atomicLoad(&null._state)faults at offset0x2f8. Exact match.Ordering bug: worker-level
disown(:900-912) frees theConn(http_conn_pool.release+conn_mem_pool.destroy) with noEPOLL_CTL_DELof its own; it relies on a priorclose()side-effect.HTTPConn.disown(:1721-1746) doesepoll_ctl DEL, showing maintainers know epoll needs synchronous removal, but the worker disown path omits it.processHTTPData.close/.unknown(:838) closes the socket on a worker thread then signals; the actual free happens later in the event loop (processSignal:782).close()/epoll vs concurrentepoll_waithas an inherent window undermax_connectionschurn. The deferred-signal fix (:576-657) only orders signal-vs-recv within one batch and does not cover cross-batch / worker-close races.Fix direction
EPOLL_CTL_DELbefore close, synchronously on the owning still-valid fd, centralized inConn.close()(:1540) + theprocessHTTPData.closebranch; orConnthrough a clobberingMemoryPool.Lands upstream in
karlseguin/http.zig; carry via upstream PR or a temporary fork pin inbuild.zig.zon(archive url + hash).Repro
Likely under connection churn at/near
max_connections(1000).Not a duplicate of