Skip to content

fix(pam-launch): raise WebRTC connect timeout default from 24s to 60s for JIT environments#2034

Open
msawczynk wants to merge 2 commits into
Keeper-Security:masterfrom
msawczynk:fix/webrtc-timeout-jit-provisioning
Open

fix(pam-launch): raise WebRTC connect timeout default from 24s to 60s for JIT environments#2034
msawczynk wants to merge 2 commits into
Keeper-Security:masterfrom
msawczynk:fix/webrtc-timeout-jit-provisioning

Conversation

@msawczynk
Copy link
Copy Markdown
Contributor

Problem

On environments with JIT (ephemeral user) provisioning enabled, pam launch consistently fails with:

pam launch: WebRTC connection not established within 24.0s
(last connection_state='Connecting', tube_status='connecting').
ICE negotiation stalled — this is usually transient; please re-run the command.

The tell-tale sign: "Connection established successfully" then appears in the terminal ~10s after the error. The ICE negotiation was not stalled — it was still in progress and completed successfully, just after the 24s timeout fired.

Root cause

The 24s default was sized for non-JIT TURN-relay paths (~15–20s observed). JIT ephemeral account provisioning adds gateway-side work (create OS user, set up SSH keys) on top of the ICE round-trip, pushing total wall-clock time to 30–45s in observed lab runs.

Fix

Raise _PAM_WEBRTC_CONNECT_TIMEOUT_DEFAULT from 24.0 to 60.0 seconds.

  • 60s is well above the worst observed JIT+TURN path (~45s).
  • Genuinely stuck sessions (ICE permanently at Connecting) are still bounded and fail fast on retry.
  • PAM_WEBRTC_CONNECT_TIMEOUT_SEC env override is unchanged.

Evidence

Lab session (gateway v1.8.0.111, JIT enabled, TURN relay path):

  • Old timeout (24s): error fires, "Connection established successfully" appears ~10s later as ghost output.
  • With PAM_WEBRTC_CONNECT_TIMEOUT_SEC=60: session connects cleanly, ephemeral user keeper_r6vi5wh6y authenticated, whoami/id/uptime run successfully.

Files changed

File Change
keepercommander/commands/pam_launch/connect_timing.py 24.060.0; updated docstring with JIT rationale

Made with Cursor

Martin Sawczynski and others added 2 commits May 8, 2026 17:00
…tart loop

`check_workflow_for_launch` calls `start_workflow_for_record` and then
immediately re-validates via `validator.validate()`.  Because the router
processes `start_workflow` asynchronously, the validate call can return
`WS_READY_TO_START` before the `WS_STARTED` transition is visible.

At that point `handled_ready_to_start` is already `True`, so the outer
loop falls to the "already-handled" guard and returns
`WorkflowGate(allowed=False)`.  The user sees "Workflow approved but not
yet checked out" a second time and is prompted to check out again,
creating an infinite loop until the session is restarted.

Fix: after a successful `start_workflow_for_record`, poll with
exponential back-off (0.5 → 1 → 2 → 2 s, ≤5.5 s total) until the
router confirms `WS_STARTED` or retries are exhausted.  On success the
outer loop breaks immediately; on failure the outer loop continues to its
normal handling.

Confirmed in lab: `pam launch <record>` with auto-checkout no longer
loops; gateway reaches "Establishing secure session" without error.

Co-authored-by: Cursor <cursoragent@cursor.com>
The 24s default causes pam launch to abort with "WebRTC connection not
established within timeout" on JIT-enabled environments, even though the
connection succeeds ~10 s later.  The symptom is visible as the
"Connection established successfully" ghost message that appears in the
terminal after the error.

Root cause: JIT (ephemeral) account provisioning on the gateway adds
30-45s of wall-clock time on top of the standard TURN-relay ICE path.
The old default was sized for non-JIT TURN relay (~15-20s observed), so
JIT connections reliably hit the ceiling.

Fix: raise the default to 60s, which is well above the worst observed
JIT+TURN path (~45s) and still bounds genuinely stuck sessions (ICE
state permanently Connecting / tube_status connecting).

PAM_WEBRTC_CONNECT_TIMEOUT_SEC is unchanged and still overrides the
default for diagnostics or unusually slow environments.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants