test(ci): widen timing windows for flaky SessionStore + socket.io tests#7647
test(ci): widen timing windows for flaky SessionStore + socket.io tests#7647JohnMcLear wants to merge 2 commits intodevelopfrom
Conversation
SessionStore.ts and socketio.ts dominate develop CI failures (~25–40% of
recent runs). Both fail due to race conditions on slow Windows / loaded
Linux runners — not logic bugs.
SessionStore tests configure session expiry windows of 100ms and then
sleep 110ms before asserting. On a slow runner, the wall-clock between
`set()` and the assertion can exceed 100ms, the timeout in
`SessionStore._updateExpirations()` then sees `exp.real <= now` and
calls `_destroy()`, deleting the DB record before the assertion runs.
The test then reads `null` / `undefined` instead of the expected JSON.
Tripling each affected window (100→300, 110→330, 200→600) keeps the
relative timing semantics identical while leaving enough headroom for
a slow CI runner. Local run is +3s on this spec; cheap insurance for
the global flake rate.
`waitForSocketEvent` in tests/backend/common.ts uses a 1s timeout for
socket.io message round-trips. The socket.io handshake + auth +
CLIENT_READY can exceed 1s on a slow runner; bumped to 5s.
The most-failing tests this addresses:
- SessionStore: get of record from previous run (not yet expired)
- SessionStore: external expiration update is picked up
- SessionStore: shutdown cancels timeouts
- socketio: !authn anonymous cookie /p/pad -> 200, ok
- socketio: authn user /p/pad -> 200, ok
- clientvar_rev_consistency: CLIENT_VARS stays consistent under
concurrent edits during handshake
All 28 SessionStore + 33 socketio tests pass locally.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
Review Summary by QodoWiden timing windows for flaky SessionStore and socket.io tests
WalkthroughsDescription• Increase SessionStore test timing windows from 100/110ms to 300/330ms • Extend socket.io event timeout from 1s to 5s • Eliminate race conditions on slow CI runners • Preserve test semantics while adding headroom Diagramflowchart LR
A["Slow CI Runners"] -->|"Race Conditions"| B["Flaky Tests"]
B -->|"SessionStore: 100→300ms expiry"| C["Wider Timing Windows"]
B -->|"Wait: 110→330ms"| C
B -->|"Socket.io: 1s→5s timeout"| C
C -->|"Eliminates Flake"| D["Stable CI Runs"]
File Changes1. src/tests/backend/common.ts
|
Code Review by Qodo
1.
|
… for cleanup Both items raised in Qodo's review of #7647. 1) Hardcoded 5s socket wait waitForSocketEvent() now takes an optional timeoutMs (default 1000ms, matching pre-PR behaviour). Only the known-slow connect() and handshake() paths pass 5000ms — they're the ones blowing the 1s budget on loaded CI runners. Per-message waits (waitForAcceptCommit and ad-hoc callers in messages.ts etc.) keep the 1s default so failures surface fast with the descriptive helper error rather than the generic Mocha timeout. 2) SessionStore waits still tight Replaced fixed sleeps with a small `eventually()` poller for the three "record should be gone after expiry" assertions: - set of session that expires - switch from non-expiring to expiring - get of record from previous run (not yet expired) These now poll every 25ms up to 2000ms so they pass immediately when cleanup completes on a fast runner, and tolerate hundreds of ms of event-loop delay on a slow one. No fixed coupling between sleep duration and expiry duration. For the inverse "record should still be there" case in `shutdown cancels timeouts`, polling doesn't apply (we're verifying that a cancelled timer did NOT fire, which requires a real wait past the original expiry). Bumped expires 300→500ms and wait 330→700ms so setup (set+get+shutdown) has 500ms before the timer would fire (vs. 30ms previously) and the 700ms wait still passes the original expiry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed f339991 addressing both Qodo points: 1. Hardcoded 5s socket wait — 2. SessionStore waits still tight — for the three "record should be gone after expiry" assertions ( For the inverse case All 28 SessionStore + 33 socketio tests pass locally. |
Summary
Two flaky test groups dominate develop CI failures (~30% of recent runs over the past week). Both are timing fragility, not logic bugs.
tests/backend/specs/SessionStore.ts—set(sess)schedules a timeout based on the cookie'sexpires. Tests use 100ms expiries + 110ms waits. On slow runners (Windows / loaded Linux), the wall-clock betweenset()and the assertion can exceed 100ms;_updateExpirationsthen seesexp.real <= nowand calls_destroy(), deleting the DB record. The next assertion readsnull/undefinedand fails. Tripled affected windows (100→300, 110→330, 200→600) — semantics preserved, ~+3s headroom.tests/backend/common.ts:waitForSocketEvent— 1s timeout. Socket.io handshake + auth + CLIENT_READY round-trip can exceed 1s on a slow runner. Bumped to 5s.This addresses the specific test cases consistently appearing in failed runs:
get of record from previous run (not yet expired),external expiration update is picked up,shutdown cancels timeouts!authn anonymous cookie /p/pad -> 200, ok,authn user /p/pad -> 200, okCLIENT_VARS stays consistent under concurrent edits during handshakeThe latest develop commit happened to be green, but ~25–40% of recent develop runs failed on at least one of these. This PR removes that flake.
Test plan
mocha tests/backend/specs/SessionStore.ts— 28 pass locally (5s)mocha tests/backend/specs/socketio.ts— 33 pass locally (2s)🤖 Generated with Claude Code