Don't crash on client disconnect (Server and Application)#751
Merged
Conversation
334c399 to
f7cf66a
Compare
f7cf66a to
dfaef3d
Compare
Fixes rails#724. When a Spring client disconnects before the server-side handshake completes — most commonly because Spring.connect_timeout (default 5s) fires on a large app whose preloader takes longer to boot — writes to that client raise Errno::EPIPE. There are crash sites in two processes, with multiple unrescued write call sites in the preloader. Spring::Server#serve -------------------- server.rb calls `client.puts env.version` early in serve. On a disconnected client this raises EPIPE, propagates out of serve, up through the `loop { serve server.accept }` in start_server, and kills the entire server process. The OS socket file lingers, so every subsequent client takes the warm-run path, connects, and times out in verify_server_version until the user runs `spring stop`. Spring::Application#serve ------------------------- Same crash mode in the preloader, with multiple unrescued write sites: * client.puts(0) — main-flow preload-success write * client.puts(1) — inner preload-failure write * print_exception(…) — write to stderr inside the outer rescue Exception handler * streams.each(&:close) — IO#close performs a buffered-write flush, so when print_exception buffers content to a broken stderr, close-time flush re-raises EPIPE * client.puts(1) if pid — exit-status write inside the outer rescue Exception handler Each of these is now wrapped in `begin / rescue Errno::EPIPE / end` so EPIPE during error reporting doesn't itself crash the process. EPIPE in the main flow falls through to the existing `rescue Exception` handler, which already calls `manager.puts unless pid` — important, because without that no-pid newline ApplicationManager#run would block indefinitely on `child.gets.to_i` and deadlock the server's single-threaded accept loop. A conditional log line preserves a friendlier "client disconnected" diagnostic message for EPIPE distinct from the generic "exception:" line used for other failures. Tests ----- test/unit/server_test.rb covers Spring::Server#serve. test/unit/application_test.rb covers Spring::Application#serve at all three EPIPE sites: * cached preload (client.puts(0) before fork) * fresh preload success (client.puts(0) after preload returns) * preload failure (client.puts(1) inside the inner rescue, then EPIPE on print_exception/streams/client.puts(1) inside the outer rescue Exception handler) Each Application#serve test also asserts the manager handshake protocol via a real UNIXSocket.pair: serve must write both the early "got client" newline and a no-pid newline so that ApplicationManager#run does not block on child.gets.to_i. The assertion uses Timeout.timeout(2) + flunk so a hang fails loudly instead of stalling the suite.
dfaef3d to
3347006
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #724. Same bug as the closed #725, with the additional crash sites that PR missed.
When a client disconnects mid-handshake — typically
Spring.connect_timeout(default 5s) firing on a slow preloader boot, reproduced reliably on Shopify's monorepo — every unrescued write to the client raisesErrno::EPIPEand kills the process holding it. There are 5 such writes across 2 processes:Spring::Server#serve— worst case: the accept loop dies, the unix socket lingers, every subsequent client times out inverify_server_versionuntilspring stop.Spring::Application#serve: main-flowclient.puts(0)+ three writes nested inside the existingrescue Exceptionhandler (a sibling rescue at the same level can't catch them).Fixes adds a few
rescue Errno::EPIPEto gracefully log or recover if the client is gone.Open questions
client.puts(1) if pidrescue covers the post-fork exit-status write; reproducing it in a unit test would require lots of stubbing, I left it out of tests.Errno::ECONNRESET, but kept the surface to what Client timeout causes server crash #724 actually reports. Easy to broaden.application.rb:164(manager.recv_ioraisingSocketErroron manager close) is a different path and out of scope here — worth a follow-up.