Skip to content

Don't crash on client disconnect (Server and Application)#751

Merged
rafaelfranca merged 1 commit into
rails:mainfrom
Korri:fix-server-crash-on-client-epipe
May 1, 2026
Merged

Don't crash on client disconnect (Server and Application)#751
rafaelfranca merged 1 commit into
rails:mainfrom
Korri:fix-server-crash-on-client-epipe

Conversation

@Korri
Copy link
Copy Markdown
Contributor

@Korri Korri commented Apr 28, 2026

Fixes #724. Same bug as the closed #725, with the additional crash sites that PR missed.

When a client disconnects mid-handshake — typically Spring.connect_timeout (default 5s) firing on a slow preloader boot, reproduced reliably on Shopify's monorepo — every unrescued write to the client raises Errno::EPIPE and kills the process holding it. There are 5 such writes across 2 processes:

  • 1 in Spring::Server#serveworst case: the accept loop dies, the unix socket lingers, every subsequent client times out in verify_server_version until spring stop.
  • 4 in Spring::Application#serve: main-flow client.puts(0) + three writes nested inside the existing rescue Exception handler (a sibling rescue at the same level can't catch them).

Fixes adds a few rescue Errno::EPIPE to gracefully log or recover if the client is gone.

Open questions

  • The client.puts(1) if pid rescue covers the post-fork exit-status write; reproducing it in a unit test would require lots of stubbing, I left it out of tests.
  • I considered also rescuing Errno::ECONNRESET, but kept the surface to what Client timeout causes server crash #724 actually reports. Easy to broaden.
  • Pre-existing crash at application.rb:164 (manager.recv_io raising SocketError on manager close) is a different path and out of scope here — worth a follow-up.

@Korri Korri changed the title Don't crash the server when a client disconnects mid-handshake Don't crash on client disconnect (Server and Application) Apr 28, 2026
@Korri Korri force-pushed the fix-server-crash-on-client-epipe branch 2 times, most recently from 334c399 to f7cf66a Compare April 29, 2026 16:17
Comment thread lib/spring/application.rb Outdated
Comment thread test/unit/application_test.rb Outdated
Comment thread test/unit/server_test.rb Outdated
@Korri Korri force-pushed the fix-server-crash-on-client-epipe branch from f7cf66a to dfaef3d Compare April 29, 2026 18:53
Fixes rails#724.

When a Spring client disconnects before the server-side handshake
completes — most commonly because Spring.connect_timeout (default 5s)
fires on a large app whose preloader takes longer to boot — writes to
that client raise Errno::EPIPE. There are crash sites in two
processes, with multiple unrescued write call sites in the preloader.

Spring::Server#serve
--------------------
server.rb calls `client.puts env.version` early in serve. On a
disconnected client this raises EPIPE, propagates out of serve, up
through the `loop { serve server.accept }` in start_server, and kills
the entire server process. The OS socket file lingers, so every
subsequent client takes the warm-run path, connects, and times out in
verify_server_version until the user runs `spring stop`.

Spring::Application#serve
-------------------------
Same crash mode in the preloader, with multiple unrescued write
sites:

  * client.puts(0)         — main-flow preload-success write
  * client.puts(1)         — inner preload-failure write
  * print_exception(…)     — write to stderr inside the outer
                             rescue Exception handler
  * streams.each(&:close)  — IO#close performs a buffered-write
                             flush, so when print_exception buffers
                             content to a broken stderr, close-time
                             flush re-raises EPIPE
  * client.puts(1) if pid  — exit-status write inside the outer
                             rescue Exception handler

Each of these is now wrapped in `begin / rescue Errno::EPIPE / end`
so EPIPE during error reporting doesn't itself crash the process.
EPIPE in the main flow falls through to the existing
`rescue Exception` handler, which already calls
`manager.puts unless pid` — important, because without that no-pid
newline ApplicationManager#run would block indefinitely on
`child.gets.to_i` and deadlock the server's single-threaded accept
loop. A conditional log line preserves a friendlier "client
disconnected" diagnostic message for EPIPE distinct from the generic
"exception:" line used for other failures.

Tests
-----
test/unit/server_test.rb covers Spring::Server#serve.

test/unit/application_test.rb covers Spring::Application#serve at
all three EPIPE sites:

  * cached preload (client.puts(0) before fork)
  * fresh preload success (client.puts(0) after preload returns)
  * preload failure (client.puts(1) inside the inner rescue, then
    EPIPE on print_exception/streams/client.puts(1) inside the
    outer rescue Exception handler)

Each Application#serve test also asserts the manager handshake
protocol via a real UNIXSocket.pair: serve must write both the
early "got client" newline and a no-pid newline so that
ApplicationManager#run does not block on child.gets.to_i. The
assertion uses Timeout.timeout(2) + flunk so a hang fails loudly
instead of stalling the suite.
@Korri Korri force-pushed the fix-server-crash-on-client-epipe branch from dfaef3d to 3347006 Compare May 1, 2026 14:21
@rafaelfranca rafaelfranca merged commit 2760869 into rails:main May 1, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Client timeout causes server crash

2 participants