fix(pam-launch): raise WebRTC connect timeout default from 24s to 60s for JIT environments#2034
Open
msawczynk wants to merge 2 commits into
Open
Conversation
…tart loop `check_workflow_for_launch` calls `start_workflow_for_record` and then immediately re-validates via `validator.validate()`. Because the router processes `start_workflow` asynchronously, the validate call can return `WS_READY_TO_START` before the `WS_STARTED` transition is visible. At that point `handled_ready_to_start` is already `True`, so the outer loop falls to the "already-handled" guard and returns `WorkflowGate(allowed=False)`. The user sees "Workflow approved but not yet checked out" a second time and is prompted to check out again, creating an infinite loop until the session is restarted. Fix: after a successful `start_workflow_for_record`, poll with exponential back-off (0.5 → 1 → 2 → 2 s, ≤5.5 s total) until the router confirms `WS_STARTED` or retries are exhausted. On success the outer loop breaks immediately; on failure the outer loop continues to its normal handling. Confirmed in lab: `pam launch <record>` with auto-checkout no longer loops; gateway reaches "Establishing secure session" without error. Co-authored-by: Cursor <cursoragent@cursor.com>
The 24s default causes pam launch to abort with "WebRTC connection not established within timeout" on JIT-enabled environments, even though the connection succeeds ~10 s later. The symptom is visible as the "Connection established successfully" ghost message that appears in the terminal after the error. Root cause: JIT (ephemeral) account provisioning on the gateway adds 30-45s of wall-clock time on top of the standard TURN-relay ICE path. The old default was sized for non-JIT TURN relay (~15-20s observed), so JIT connections reliably hit the ceiling. Fix: raise the default to 60s, which is well above the worst observed JIT+TURN path (~45s) and still bounds genuinely stuck sessions (ICE state permanently Connecting / tube_status connecting). PAM_WEBRTC_CONNECT_TIMEOUT_SEC is unchanged and still overrides the default for diagnostics or unusually slow environments. Co-authored-by: Cursor <cursoragent@cursor.com>
idimov-keeper
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On environments with JIT (ephemeral user) provisioning enabled,
pam launchconsistently fails with:The tell-tale sign: "Connection established successfully" then appears in the terminal ~10s after the error. The ICE negotiation was not stalled — it was still in progress and completed successfully, just after the 24s timeout fired.
Root cause
The 24s default was sized for non-JIT TURN-relay paths (~15–20s observed). JIT ephemeral account provisioning adds gateway-side work (create OS user, set up SSH keys) on top of the ICE round-trip, pushing total wall-clock time to 30–45s in observed lab runs.
Fix
Raise
_PAM_WEBRTC_CONNECT_TIMEOUT_DEFAULTfrom24.0to60.0seconds.Connecting) are still bounded and fail fast on retry.PAM_WEBRTC_CONNECT_TIMEOUT_SECenv override is unchanged.Evidence
Lab session (gateway v1.8.0.111, JIT enabled, TURN relay path):
PAM_WEBRTC_CONNECT_TIMEOUT_SEC=60: session connects cleanly, ephemeral userkeeper_r6vi5wh6yauthenticated,whoami/id/uptimerun successfully.Files changed
keepercommander/commands/pam_launch/connect_timing.py24.0→60.0; updated docstring with JIT rationaleMade with Cursor