Skip to content

fix(workflow): poll for WS_STARTED after checkout to break ready_to_start loop#2033

Open
msawczynk wants to merge 1 commit into
Keeper-Security:masterfrom
msawczynk:fix/workflow-checkout-propagation-delay
Open

fix(workflow): poll for WS_STARTED after checkout to break ready_to_start loop#2033
msawczynk wants to merge 1 commit into
Keeper-Security:masterfrom
msawczynk:fix/workflow-checkout-propagation-delay

Conversation

@msawczynk
Copy link
Copy Markdown
Contributor

Problem

pam launch (and pam tunnel) exhibit an infinite prompt loop when the
workflow ready_to_start → checkout → re-validate cycle races the router:

  1. User or pam launch --auto-checkout hits WS_READY_TO_START → confirms
    checkout → start_workflow_for_record is called.
  2. The outer loop in check_workflow_for_launch immediately calls
    validator.validate() again.
  3. Because start_workflow is processed asynchronously by the router,
    the validate call can still return WS_READY_TO_START before
    WS_STARTED is visible.
  4. handled_ready_to_start is already True, so the "already-handled"
    guard fires and the function returns WorkflowGate(allowed=False).
  5. The user sees "Workflow approved but not yet checked out" a second
    time and is prompted to check out again → loop.

Fix

After a successful start_workflow_for_record, poll with exponential
back-off (0.5 → 1 → 2 → 2 s, ≤5.5 s total) until the router
confirms WS_STARTED or retries are exhausted.

  • On success: break out of the outer loop immediately (no extra network
    round-trip in the happy path once the router catches up, typically
    within the first 0.5 s poll).
  • On failure: fall through to the outer loop's normal handling (existing
    behaviour, no regression).

Testing

Verified in a live Keeper lab environment:

  • pam launch <record> --auto-checkout no longer loops; reaches
    "Establishing secure session" without repeating the checkout prompt.
  • Manual flow (pam workflow start before pam launch) is unaffected —
    the router already returns WS_STARTED by the time pam launch runs.
  • --wait polling path (_poll_until_state_change) is unchanged.

Files changed

File Change
keepercommander/commands/workflow/mfa.py Add import time; replace bare continue after checkout with back-off poll

Made with Cursor

…tart loop

`check_workflow_for_launch` calls `start_workflow_for_record` and then
immediately re-validates via `validator.validate()`.  Because the router
processes `start_workflow` asynchronously, the validate call can return
`WS_READY_TO_START` before the `WS_STARTED` transition is visible.

At that point `handled_ready_to_start` is already `True`, so the outer
loop falls to the "already-handled" guard and returns
`WorkflowGate(allowed=False)`.  The user sees "Workflow approved but not
yet checked out" a second time and is prompted to check out again,
creating an infinite loop until the session is restarted.

Fix: after a successful `start_workflow_for_record`, poll with
exponential back-off (0.5 → 1 → 2 → 2 s, ≤5.5 s total) until the
router confirms `WS_STARTED` or retries are exhausted.  On success the
outer loop breaks immediately; on failure the outer loop continues to its
normal handling.

Confirmed in lab: `pam launch <record>` with auto-checkout no longer
loops; gateway reaches "Establishing secure session" without error.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant