feat(session): switch a session's agent mid-flight by Pulkit7070 · Pull Request #2412 · AgentWrapper/agent-orchestrator

Pulkit7070 · 2026-07-04T09:40:27Z

What & why

A session's agent (harness) is currently fixed at spawn (SessionRecord.Harness, resolved once by effectiveHarness in session_manager) and can never change. Mixing agents on a single task — e.g. codex to test, claude-code to write, a cheaper model for cost — is impossible without throwing the work away and respawning.

This PR lets you change a session's agent in place, keeping the same git worktree (all code + uncommitted work preserved). The new agent launches fresh — there is no native resume, since a different harness cannot read the outgoing agent's session. An optional model override rides the same path.

Behaviour

Two cases are handled:

Live session → swap in place. The old agent is torn down only after the new launch command validates, so a bad/unknown harness never disrupts the running session. A BeginSwitch/EndSwitch guard on the lifecycle manager makes the reaper ignore the brief window where the old runtime is gone and the new one isn't up yet — otherwise a "dead" probe would wrongly terminate the session mid-switch.
Terminated session → relaunch-as. When an agent exits (e.g. codex ends its process on completion) the session is marked terminated. Switching such a session restores its worktree and launches the chosen agent fresh under it — the way to bring a finished task back under a different agent (plain restore keeps the original harness). Only fully merged sessions stay locked.

lifecycle.MarkSwitched atomically changes the persisted harness, points at the new runtime handle, and clears the harness-specific AgentSessionID (which MarkSpawned's metadata merge cannot, since it only sets non-empty fields), resetting activity/first-signal so the new agent re-proves its hook pipeline.

Surfaces

Backend: sessionmanager.SwitchHarness (live + terminated paths) and lifecycle guard + MarkSwitched.
API: POST /sessions/{id}/switch {harness, model?} — 400 unknown harness, 404 unknown session, 409 switch-in-progress. OpenAPI spec + frontend/src/api/schema.ts regenerated (pinned openapi-typescript@7.4.4).
CLI: ao session switch <id> --harness <agent> [--model ...].
UI: the session inspector's Overview → Agent row is now a dropdown that switches the agent in place (or relaunches a terminated one), built from the existing dropdown-menu primitive and AGENT_OPTIONS; merged sessions stay read-only.

Tests

live swap clears AgentSessionID
unknown harness leaves the running agent untouched (validate-before-destroy)
terminated session relaunches under the new agent (restore-as)
create-failure terminates cleanly (no live-session-with-dead-handle)
reaper guard suppresses termination during a switch
MarkSwitched changes harness, clears AgentSessionID, resets first-signal

go build ./..., go vet ./..., the full backend go test ./..., and frontend tsc all pass.

Notes / follow-ups

No migration: the harness column and UpdateSession already exist.
The target agent's CLI must be installed on PATH — the existing pre-flight binary check rejects otherwise, surfaced inline in the UI.
Context handoff (carrying the outgoing agent's transcript into the new one) is intentionally out of scope; this leaves a clean seam (the fresh launch's prompt) for it.

🤖 Generated with Claude Code

Add the ability to change a session's agent (harness) without losing the worktree. Previously a session's harness was fixed at spawn and could never change; mixing agents on one task (e.g. codex to test, claude-code to write, a cheaper model for cost) was impossible. A switch keeps the same git worktree (all code + uncommitted work preserved) and launches the new agent fresh — there is no native resume, since a different harness cannot read the outgoing agent's session. An optional model override rides the same path. Two cases are handled: - Live session: swap in place. The old agent is torn down only AFTER the new launch command validates, so a bad/unknown harness never disrupts the running session. A BeginSwitch/EndSwitch guard on the lifecycle manager makes the reaper ignore the brief runtime gap so it is not mistaken for a crash. - Terminated session (e.g. the agent exited): relaunch-as. The worktree is restored and the new agent launched fresh under it — the way to bring a finished task back under a different agent (plain restore keeps the harness). lifecycle.MarkSwitched atomically changes the persisted harness, points at the new runtime handle, and clears the harness-specific AgentSessionID (which MarkSpawned's merge cannot), resetting activity/first-signal so the new agent re-proves its hook pipeline. Surfaces: - Backend: sessionmanager.SwitchHarness + lifecycle guard/MarkSwitched. - API: POST /sessions/{id}/switch {harness, model?} (400/404/409 mapped); OpenAPI spec + frontend schema.ts regenerated. - CLI: `ao session switch <id> --harness <agent> [--model ...]`. - UI: the session inspector's Overview "Agent" row is now a dropdown that switches the agent in place (or relaunches a terminated one); merged sessions stay read-only. Tests cover: live swap clears AgentSessionID, unknown harness leaves the agent running, terminated relaunch-as, create-failure terminates cleanly, the reaper guard suppresses termination during a switch, and MarkSwitched semantics. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

nikhilachale

P1: switch-in-progress guard is not atomic, so duplicate switches can still race
internal/session_manager/manager.go:591 checks IsSwitching, but internal/session_manager/manager.go:656 sets the guard later and BeginSwitch is idempotent with no failure return. Two
concurrent requests can both pass the check, both enter BeginSwitch, and then both destroy/create runtimes against the same worktree. The terminated path is worse because internal/
session_manager/manager.go:691 never begins the guard at all, so two relaunch-as switches can create two runtimes. This needs an atomic TryBeginSwitch/compare-and-set style API used by
both live and terminated switch paths.
P2: terminated switch bypasses the existing not-resumable guard
Restore refuses promptless, unresumable worker sessions via restoreArgv at internal/session_manager/manager.go:1433, but relaunchTerminatedWithHarness always calls fresh
GetLaunchCommand with meta.Prompt at internal/session_manager/manager.go:707. That means ao session switch can resurrect a terminated worker with no prompt/native resume context into a
blank agent session, which the existing restore path deliberately prevents. If switch-relaunch is meant to preserve restore semantics except for harness selection, it should reject the
same promptless worker case.

…ability P1: the switch-in-progress guard was a check-then-act (IsSwitching + later BeginSwitch) and the terminated path never claimed it, so two concurrent switches could both proceed and race two teardown/relaunch cycles over one worktree. Replace with an atomic lifecycle.TryBeginSwitch (single-critical- section compare-and-set) claimed once at the top of SwitchHarness for both the live and terminated paths, released via defer. P2: relaunchTerminatedWithHarness always fresh-launched with meta.Prompt, bypassing restoreArgv's guard. Since the new harness cannot native-resume the old agent's session, a terminated worker with no saved prompt would blank- relaunch — which Restore deliberately refuses. Reject the same promptless- worker case with ErrNotResumable (orchestrators stay promptless by design). Tests: reject concurrent switch, terminated promptless worker rejected; lifecycle TryBeginSwitch compare-and-set. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pulkit7070 · 2026-07-04T18:33:46Z

Thanks @nikhilachale — both good catches, fixed in 551742c.

P1 (atomic switch guard): Replaced the check-then-act (IsSwitching + later BeginSwitch) with an atomic lifecycle.TryBeginSwitch — a single-critical-section compare-and-set. It's now claimed once at the top of SwitchHarness for both the live and terminated paths and released via defer, so two concurrent requests can't both pass, and the terminated path is no longer unguarded. Added TestSwitchHarness_RejectsConcurrentSwitch and a compare-and-set test on the lifecycle side.

P2 (terminated resumability): relaunchTerminatedWithHarness now mirrors restoreArgv's guard — since the new harness can't native-resume the old agent's session, a terminated worker with no saved prompt would blank-relaunch, which Restore deliberately refuses. It now returns ErrNotResumable for that case (orchestrators stay promptless by design). Added TestSwitchHarness_TerminatedPromptlessWorkerRejected.

Full backend go test ./... + frontend tsc pass.

# Conflicts: # backend/internal/httpd/apispec/openapi.yaml # frontend/src/api/schema.ts # frontend/src/renderer/components/SessionInspector.tsx

…h-agent

Pulkit7070 · 2026-07-04T18:45:57Z

Flow — `SwitchHarness`

One entry point, forking on whether the agent is still alive, converging on one atomic write. Validate-before-destroy + an atomic guard mean a bad agent never disrupts a running session and the reaper never mistakes the swap for a crash.

flowchart TD
  A["POST /sessions/{id}/switch { harness, model? }"] --> B{session found +<br/>worktree present?}
  B -- no --> E1["reject: ErrNotFound /<br/>ErrIncompleteHandle"]
  B -- yes --> C{"TryBeginSwitch(id)<br/>atomic compare-and-set"}
  C -- already switching --> E2["reject: ErrSwitchInProgress"]
  C -- claimed (defer EndSwitch) --> D{"validate agent:<br/>known + adapter + binary on PATH"}
  D -- invalid --> E3["reject: ErrUnknownHarness /<br/>binary not found"]
  D -- ok --> S{session terminated?}

  S -- "no · LIVE (swap in place)" --> L1["prepareWorkspace"]
  L1 --> L2["runtime.Destroy old handle"]
  L2 --> L3["runtime.Create new"]
  L3 -- fail --> LF["MarkTerminated<br/>(no dead-handle-live)"]

  S -- "yes · TERMINATED (relaunch-as)" --> T0{promptless worker?}
  T0 -- yes --> E4["reject: ErrNotResumable"]
  T0 -- no --> T1["workspace.Restore"]
  T1 --> T2["prepareWorkspace → runtime.Create new"]

  L3 --> M["lifecycle.MarkSwitched<br/>set harness · point at new handle ·<br/>clear AgentSessionID · reset activity"]
  T2 --> M
  M --> R["EndSwitch (defer) →<br/>updated session streamed over CDC<br/>(terminal re-attaches, UI reflects new agent)"]

While the guard is held, the lifecycle reducer ignores this session's reaper "dead" probe — that's what makes the brief runtime gap safe. The worktree (code + uncommitted work) is preserved throughout; only the runtime handle is disposable.

Pulkit7070 · 2026-07-05T05:01:27Z

@nikhilachale re-requesting your review both of your comments are addressed (atomic TryBeginSwitch guard for P1, promptless-worker ErrNotResumable for P2), merge conflicts resolved, and the flow diagram is posted above. Ready for another pass

…-used agents Two fixes on the agent-switch path. Medium (review): terminated relaunch-as restored the worktree to ws.Path but MarkSwitched only updated the runtime handle, leaving the old Metadata.WorkspacePath/Branch. A changed session prefix or managed root could restore to a different path while the stored session still pointed at the old one, breaking later terminal/workspace/cleanup ops. MarkSwitched now takes full SessionMetadata and persists WorkspacePath/Branch from the launch. Bug: switching to a harness that had already run the session relaunched it fresh, colliding with the agent's own prior native session — Claude Code pins a deterministic --session-id, so a fresh relaunch failed with "Session ID <uuid> is already in use". Sessions now track the set of harnesses they've launched (SessionMetadata.LaunchedHarnesses); a previously-used harness RESUMES (via the adapter's restore command) while a new one launches fresh. The promptless-worker guard now applies only to fresh launches (a resume needs no saved prompt). Tests: MarkSwitched persists workspace path/branch + launched set; resume for a previously-used harness; fresh launch (and set update) for a new harness. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pulkit7070 · 2026-07-05T14:17:32Z

Fixed in 1e6d9a3.

Medium (workspace path not persisted): MarkSwitched now takes the full domain.SessionMetadata and persists WorkspacePath/Branch from the launch, so the terminated relaunch-as records the restored worktree instead of leaving the stale one. Added a MarkSwitched test that passes a different path/branch and asserts the session metadata is updated.

Bonus — also fixed a related runtime bug I hit while testing: switching to a harness that had already run the session relaunched it fresh and collided with the agent's own prior native session — Claude Code pins a deterministic --session-id, so a fresh relaunch failed with Session ID <uuid> is already in use. Sessions now track the set of harnesses they've launched (SessionMetadata.LaunchedHarnesses); a previously-used harness resumes via the adapter's restore command, a new one launches fresh. The promptless-worker ErrNotResumable guard now applies only to fresh launches (a resume needs no saved prompt). Tests cover resume-vs-fresh both ways.

Backend go test ./... + go vet pass.

…pdown to authed agents A terminated agent's tmux session can outlive the agent process (the keep-alive shell keeps it open), so its deterministic session name stays taken. The terminated relaunch-as path skipped Destroy (it assumed no live runtime) and went straight to Create, which failed with "duplicate session <id>" (surfaced as a 500). It now tears down any leftover runtime handle before Create — Destroy is idempotent, so an already-gone session is a no-op. The live path already did this. Test: terminated relaunch with a lingering handle destroys it before Create. UI: the Agent-row switch dropdown now lists only agents whose local auth probe passed (the catalog's authorized set, same source the spawn dialogs use) instead of every known harness, so users don't pick an agent that just fails at launch. Empty/loading states render a disabled hint. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…y triggers The previous commit tracked LaunchedHarnesses on SessionMetadata to drive the resume-vs-fresh switch decision, but the SQLite store maps metadata to explicit columns and had no column for it — so it was silently dropped on write and read back empty. The resume branch never fired, and switching back to a previously-used deterministic-id agent (Claude Code) still fresh-launched and collided ("Session ID <uuid> is already in use"). Add a durable launched_harnesses column (migration 0022), thread it through the InsertSession/UpdateSession/Select queries (sqlc regenerated), and serialise the harness set as a comma-separated string in the store. The set now round-trips, so a previously-used harness resumes instead of colliding, surviving daemon restarts (the agent's on-disk session does too). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pulkit7070 and others added 2 commits July 4, 2026 15:07

chore: format with prettier [skip ci]

088b356

nikhilachale self-requested a review July 4, 2026 14:36

nikhilachale reviewed Jul 4, 2026

View reviewed changes

nikhilachale requested review from illegalcall and neversettle17-101 July 4, 2026 14:48

Pulkit7070 and others added 2 commits July 4, 2026 23:56

chore: format with prettier [skip ci]

3da871c

Pulkit7070 added 2 commits July 5, 2026 00:13

Merge remote-tracking branch 'origin/main' into feat/switch-agent

9bb4093

# Conflicts: # backend/internal/httpd/apispec/openapi.yaml # frontend/src/api/schema.ts # frontend/src/renderer/components/SessionInspector.tsx

Merge remote-tracking branch 'fork/feat/switch-agent' into feat/switc…

438a5bd

…h-agent

Pulkit7070 requested a review from nikhilachale July 5, 2026 05:04

Pulkit7070 and others added 2 commits July 5, 2026 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(session): switch a session's agent mid-flight#2412

feat(session): switch a session's agent mid-flight#2412
Pulkit7070 wants to merge 9 commits into
AgentWrapper:mainfrom
Pulkit7070:feat/switch-agent

Pulkit7070 commented Jul 4, 2026

Uh oh!

nikhilachale left a comment

Uh oh!

Pulkit7070 commented Jul 4, 2026

Uh oh!

Pulkit7070 commented Jul 4, 2026

Uh oh!

Pulkit7070 commented Jul 5, 2026 •

edited

Loading

Uh oh!

Pulkit7070 commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Pulkit7070 commented Jul 4, 2026

What & why

Behaviour

Surfaces

Tests

Notes / follow-ups

Uh oh!

nikhilachale left a comment

Choose a reason for hiding this comment

Uh oh!

Pulkit7070 commented Jul 4, 2026

Uh oh!

Pulkit7070 commented Jul 4, 2026

Flow — SwitchHarness

Uh oh!

Pulkit7070 commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pulkit7070 commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Flow — `SwitchHarness`

Pulkit7070 commented Jul 5, 2026 •

edited

Loading