Extend unmanaged OAuth flow to drive code exchange in-process#2896
Conversation
1612def to
7bd8d5f
Compare
Adds a new sub-behavior to the unmanaged OAuth flow that lets the
runtime own PKCE / state / DCR / token exchange while the connected
client acts as a thin courier (open browser, receive deeplink,
forward {code, state}).
Activated by a new server-level flag --mcp-oauth-redirect-uri (also
exposed on the runtime as WithUnmanagedOAuthRedirectURI and on the
RuntimeConfig as MCPOAuthRedirectURI). When set, the runtime:
- generates state + PKCE in-process
- runs DCR if needed (or uses per-toolset explicit credentials)
- builds the full authorize URL with the configured redirect_uri
- emits an elicitation whose Meta carries cagent/authorize_url +
cagent/state alongside the existing auth_server_metadata
- accepts {code, state} as a ResumeElicitation payload, verifies
state in constant time, and exchanges the code at the token
endpoint using the same redirect_uri (RFC 6749 §4.1.3 binding)
- stores the resulting token in the keychain with client_id /
client_secret stamped on for later silent refresh
When the flag is empty, the existing client-driven contract is
preserved verbatim: the elicitation carries only metadata and the
client is expected to reply with {access_token, refresh_token, …}.
The {access_token, …} reply shape is also accepted on the new path
so a client that prefers to do the exchange itself is still free to.
A per-toolset RemoteOAuthConfig.CallbackRedirectURL overrides the
runtime-wide flag for that toolset.
Docker-agent never learns anything about the URL it advertises: it
is opaque, and what happens at the URL (HTML bouncer, OS deeplink,
universal-link claim, …) is the host's concern.
Plumbing:
- cmd/root/flags.go: add --mcp-oauth-redirect-uri
- pkg/config/runtime.go: MCPOAuthRedirectURI field
- pkg/runtime/runtime.go: unmanagedOAuthRedirectURI field +
WithUnmanagedOAuthRedirectURI Opt
- pkg/runtime/loop.go: pass through to ConfigureHandlers
- pkg/server/session_manager.go: forward from runtime config
- pkg/tools/capabilities.go: extend OAuthCapable interface +
ConfigureHandlers
- pkg/tools/mcp/mcp.go, remote.go, session_client.go,
builtin/mcpcatalog: implement the new SetUnmanagedOAuthRedirectURI
- pkg/tools/mcp/oauth.go: rework handleUnmanagedOAuthFlow,
factor out resolveClientCredentials helper and
consumeUnmanagedElicitationReply
- docs/features/remote-mcp/index.md: new 'Unmanaged OAuth flow'
section documenting the meta keys and behavior
Tests: six new oauth_test.go tests covering the drive-flow happy
path (incl. asserting RFC 6749 §4.1.3 redirect_uri parity at /token),
state-mismatch rejection, legacy access-token reply on the new path,
legacy mode shape (no authorize_url emitted), legacy mode rejection
of {code, state}, and the per-toolset > runtime-wide precedence.
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
7bd8d5f to
c97124d
Compare
Builds on the unmanaged-OAuth drive-flow added in this branch by
giving embedders a second, session-less way to deliver the deeplink
payload to the runtime.
Adds a new HTTP route:
POST /api/mcp-oauth/callback?state=<opaque>
[&code=<opaque>]
[&error=<rfc6749 code>]
[&error_description=<text>]
The parameters mirror what an authorization server sends to a
redirect URI per RFC 6749 §4.1.2. The body is empty; everything
travels in the query string.
A process-wide pending-OAuth registry maps state -> waiting
unmanaged flow. handleUnmanagedOAuthFlow registers its waiter on
entry (drive-flow only) and unregisters on exit. The new route
looks up the waiter by state and hands it the payload through a
single-shot buffered channel. The waiting goroutine then performs
the token exchange and persists the token exactly as before.
The lookup-by-state IS the authentication: only states the runtime
has just generated are present in the registry. Unknown or replayed
states return 404. State values continue to come from GenerateState
(>=128 bits, opaque, base64).
Inside the flow, the elicitation result and the direct callback
race. Whichever arrives first wins; the other is cancelled cleanly
with no further side effects on the connected client.
Strictly additive to the existing session-keyed
POST /api/sessions/:id/elicitation path: both reply shapes
({access_token, ...} legacy and {code, state} drive-flow) still
work, and an embedder may use the elicitation path, the new
callback path, or both. The --mcp-oauth-redirect-uri flag, the
cagent/authorize_url and cagent/state elicitation meta keys, and
the runtime's redirect-URI parity check at /token are unchanged.
Why it matters for embedders:
- No session context required. The embedder forwards an opaque
{state, code} (or {state, error}) and is done. It never needs
to know which session is waiting, or even that the runtime has
sessions at all.
- No state tracking required. All state -> waiter accounting
lives where it belongs: inside the runtime.
- No UI-lifecycle dependency. As long as the runtime is alive
when the deeplink fires, the token is exchanged and stored. An
embedder UI that died between authorize and callback is no
longer fatal.
- Cross-process deeplink handlers can be thin couriers: a
system-wide URL-scheme handler or an OS launcher does not need
a connection to the runtime's elicitation stream at all.
Plumbing:
- pkg/tools/mcp/pending_oauth.go: new file, process-wide
state -> chan registry, exported DeliverPendingOAuthCallback
+ ErrPendingOAuthNoWaiter for the server handler
- pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow registers a
waiter, runs the elicitation in a goroutine scoped to a
cancellable child context, and selects between elicitation
result, direct callback, and the outer ctx
- pkg/server/server.go: new mcpOAuthCallback handler at
POST /api/mcp-oauth/callback
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
The OAuth elicitation Meta map keys still carry the legacy product name (cagent/server_url, cagent/authorize_url, cagent/state, cagent/type). The runtime was renamed to docker-agent some time ago; align the meta keys with it so the contract clients see matches the project name they're integrating against. Renamed: cagent/type -> docker-agent/type cagent/server_url -> docker-agent/server_url cagent/authorize_url -> docker-agent/authorize_url (docker#2896) cagent/state -> docker-agent/state (docker#2896) Both writer sites (handleManagedOAuthFlow, handleUnmanagedOAuthFlow) and every in-tree reader (CLI runner, RemoteRuntime OAuth mirror, TUI elicitation handling) are updated together. Out-of-tree consumers will need the same one-line rename. Other cagent/* identifiers that are NOT in the elicitation Meta map (user-config paths under ~/.cagent/, the User-Agent header, the cagent/title meta key used by the user_prompt builtin) are intentionally left alone -- the rename is scoped to OAuth meta. Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
clientConnector.Connect detaches the caller's ctx with
context.WithoutCancel before sending the MCP initialize handshake.
The rationale is correct: an MCP toolset's session must outlive any
single request -- otherwise the underlying SSE/stdio connection
tears down when the request that triggered toolset.Start returns,
and a successful OAuth flow gets killed the moment its triggering
request completes.
But that detachment also severs user-initiated cancellation, which
the OAuth flow inside Connect DOES need to observe. Concretely:
handleUnmanagedOAuthFlow blocks on a select waiting for an
elicitation reply or an out-of-band callback. If the embedder
aborts the in-progress agent run while the user is in the OAuth
dialog, the streamCtx the SessionManager derived from the request
ctx gets cancelled -- but every ctx in the chain BELOW Connect's
WithoutCancel boundary stays alive. The OAuth select then blocks
forever, the per-session streaming lock is never released, and the
next user message returns 409 Conflict from RunSession.ErrSessionBusy.
This commit preserves the connection-longevity invariant and
restores user-initiated cancellation by stashing the caller's
original ctx as a value on the detached ctx. The detached ctx
remains the primary driver of the connection's lifetime, and
operations that want to observe user-initiated cancellation can
opt in by reading the parent back out and selecting on its Done
channel alongside the local ctx.
Carrying ctx-as-value is unusual but appropriate here: the parent
is metadata about how the detached ctx was constructed, not data
the request is operating on. The helpers (withCancellableParent /
cancellableParentFromContext) live next to the conector that
creates the relationship so the rationale stays close to the code.
handleUnmanagedOAuthFlow now extends its select with a userCancelCh
case keyed on the parent's Done channel. When the parent is nil
(e.g. unit tests, CLI invocations that don't go through
clientConnector.Connect), userCancelCh stays nil and the select
silently never picks it -- preserving back-compat.
Plumbing:
- pkg/tools/mcp/cancellable_parent.go: new helper file with
withCancellableParent + cancellableParentFromContext,
extensively commented because ctx-as-value is unusual.
- pkg/tools/mcp/mcp.go: clientConnector.Connect captures the
caller's ctx before WithoutCancel and attaches it via
withCancellableParent.
- pkg/tools/mcp/oauth.go: handleUnmanagedOAuthFlow extracts the
parent and adds a fourth select case watching its Done channel;
returns parent.Err() so the agent loop sees a cancellable
error and the SessionManager goroutine's deferred Unlock fires.
Tests:
- cancellable_parent_test.go: round-trip, absent-returns-nil,
nil-parent-no-op, and the key invariant that the local ctx
stays independent of the parent's cancellation (only opt-in
callers see it).
- oauth_test.go: TestUnmanagedOAuthFlow_DriveFlow_AbortsOnParent-
CtxCancellation asserts the new branch fires under the exact
boundary clientConnector.Connect sets up, returns the parent's
ctx error, and skips the token exchange.
Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
- bodyclose: close response bodies in RoundTrip test goroutines - forbidigo: use t.Context() instead of context.Background() in tests - gci/gofmt: reflow var block in pending_oauth.go - testifylint require-error: switch terminal ErrorIs to require - testifylint error-is-as: use assert.ErrorIs instead of assert.True(errors.Is(...)) - unconvert: drop redundant tools.ElicitationAction(...) conversion - unparam: drop unused []byte return from httpDoStatus helper
CI flags pre-existing lint issues in code introduced by c97124d and a93ae4d (the drive-flow and callback-route commits this PR is stacked on). Fix them here so this PR's CI is green; these fixes are also present on extend-unmanaged-oauth-flow (docker#2896). When that PR merges, this commit will be dropped on rebase. - bodyclose: close response bodies in RoundTrip test goroutines - gci/gofmt: reflow var block in pending_oauth.go - testifylint error-is-as: use assert.ErrorIs instead of assert.True(errors.Is(...)) - unconvert: drop redundant tools.ElicitationAction(...) conversion - unparam: drop unused []byte return from httpDoStatus helper Signed-off-by: Trung Nguyen <trung.nguyen@docker.com>
…uth-flow # Conflicts: # pkg/tools/mcp/oauth.go
|
❌ PR Review Failed — The review agent encountered an error and could not complete the review. View logs. |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
This PR correctly implements the new drive-flow OAuth mode with PKCE, DCR, and token exchange in-process, and the pending-OAuth registry with its channel-based waiter is race-safe. Three medium-severity issues were identified in the newly added code that are worth addressing before merge.
| slog.DebugContext(ctx, "OAuth callback received via deeplink, exchanging code for token") | ||
| // Synthesize the {code, state} payload that | ||
| // consumeUnmanagedElicitationReply expects. | ||
| content = map[string]any{"code": cb.Code, "state": expectedState} |
There was a problem hiding this comment.
[MEDIUM] State validation tautology on the deeplink callback path
When the out-of-band deeplink wins the select race, the code synthesises the payload that consumeUnmanagedElicitationReply expects by pinning state to the local expectedState:
content = map[string]any{"code": cb.Code, "state": expectedState}consumeUnmanagedElicitationReply then extracts state from that map and does:
if subtle.ConstantTimeCompare([]byte(state), []byte(expectedState)) != 1 {
return nil, errors.New("state mismatch in elicitation reply")
}Because state == expectedState by construction, this comparison is always true — the ConstantTimeCompare guard is a no-op for the deeplink path. The actual state that arrived from the HTTP callback is never independently validated inside the flow.
The registry provides the primary authentication (only a POST carrying the exact state the runtime registered can deliver to the channel), so this is not directly exploitable as-is. However:
- The per-flow check gives a false sense of security — reviewers and future maintainers may think state is re-validated from the callback, when it is not.
- If
PendingOAuthCallbackever grows aStatefield (e.g. for logging), the same pattern would silently skip validating it. - Defence-in-depth is weakened: any future change that loosens the registry's state-key strictness would remove the only real protection.
Suggestion: Either remove the ConstantTimeCompare from the deeplink branch entirely (making the code honest), or add a State field to PendingOAuthCallback, populate it from the HTTP query param in mcpOAuthCallback, and validate it here instead of synthesising the expected value.
|
|
||
| group.GET("/agents/:id/:agent_name/tools/count", s.getAgentToolCount) | ||
|
|
||
| group.POST("/mcp-oauth/callback", s.mcpOAuthCallback) |
There was a problem hiding this comment.
[MEDIUM] /api/mcp-oauth/callback is unauthenticated when no bearer token is configured
The endpoint is registered under the group that only has BearerTokenMiddleware when authToken != "":
if authToken != "" {
e.Use(BearerTokenMiddleware(authToken))
}When the server runs without --auth-token (a documented/valid configuration), any network-reachable client can POST to /api/mcp-oauth/callback?state=<s>&code=<c> with no credentials.
The pending-OAuth registry provides meaningful defence: only states the runtime has generated and is currently awaiting are accepted (unknown states → 404). However, if an attacker can observe a live state value (it appears in the elicitation meta sent to the client over the session stream, at debug-level logs, and in the authorize URL the client is asked to open), they can inject a malicious authorization code. The token exchange would then proceed with an attacker-controlled code, potentially yielding the attacker's own tokens bound to the victim's session.
Suggestions:
- This route is not a "deeplink receiver" in the same sense as the OAuth redirect URI on the authorization server — it is a programmatic API call made by the embedder. Requiring an auth token for it is reasonable even in deployments where other API calls are also authenticated.
- At minimum, add a comment explaining the threat model (what the registry guards against, what it does not) so operators know the risk of running without an auth token.
- Consider rate-limiting or adding an explicit warning in the docs that running without
--auth-tokenweakens the callback endpoint's security posture.
| userCancelCh = parentCtx.Done() | ||
| } | ||
|
|
||
| var content map[string]any |
There was a problem hiding this comment.
[MEDIUM] Indefinite block in handleUnmanagedOAuthFlow if the elicitation goroutine hangs and the parent context never cancels
ctx is the WithoutCancel-detached context set up in clientConnector.Connect — its Done() channel never fires on its own. The select therefore has only three live cancellation signals:
| Case | When it fires |
|---|---|
ctx.Done() |
Never (WithoutCancel) |
userCancelCh |
Only if the embedder actively cancels the parent ctx |
elicCh |
Only when requestElicitation returns |
If the MCP client disconnects silently (TCP RST, idle timeout, process kill) and requestElicitation has no internal deadline, the goroutine and the entire select will block indefinitely. The per-session streaming lock at the SessionManager level is then held forever: all subsequent messages from that session return 409 Conflict / ErrSessionBusy until a server restart.
callbackCh (the deeplink path) also provides no relief — it will never fire if the client that was supposed to open the browser is already gone.
Suggestion: Add a context with a reasonable deadline around the elicitation wait, e.g.:
elicCtx, elicCancel := context.WithTimeout(ctx, 10*time.Minute)
defer elicCancel()Or, if the elicitation itself must be open-ended, wrap the outer select with the same timeout so the streaming lock is released even if no reply arrives. The choice of timeout can be made configurable via RuntimeConfig.
|
Docs reference outdated metadata keys: |
|
@trungutt Let's merge this one and iterate |
Why
The current unmanaged OAuth flow (
WithManagedOAuth(false), added in #830) emits an elicitation carrying only authorization-server metadata and expects the connected client to:state{access_token, refresh_token, …}viaResumeElicitationThat contract works fine when the client is a Go process (the CLI mirror at
pkg/runtime/remote_runtime.go:handleOAuthElicitationreuses the same helpers). It is a poor fit for embedders that want to be thin couriers — for example a host that registers a custom URI scheme and forwards the deeplink payload to docker-agent — because every such host ends up re-implementing PKCE / DCR / token exchange in the host language.This branch extends the unmanaged flow with an opt-in sub-behavior where the runtime drives everything itself and the embedder just relays the deeplink payload back.
How the modes compare
Three OAuth flows are now available. The mode is picked per-toolset at flow-entry time:
flowchart TD start([handleOAuthFlow]) managed{WithManagedOAuth true?} redir{mcp-oauth-redirect-uri set?} M[Managed flow] UClient[Unmanaged - client-driven] UDrive[Unmanaged - drive-flow NEW] start --> managed managed -- yes --> M managed -- no --> redir redir -- no --> UClient redir -- yes --> UDriveManaged flow (unchanged)
The runtime owns the entire OAuth dance. A local callback server bound to
127.0.0.1:Nreceives the redirect.Unmanaged - client-driven (unchanged)
The runtime hands the client raw auth-server metadata and expects a finished token back.
Unmanaged - drive-flow (this PR)
The runtime owns PKCE, DCR, and token exchange and advertises an opaque
redirect_uri. The client (or any process that received the deeplink) delivers{ code, state }back. Two delivery paths are accepted; either or both work in a given deployment.sequenceDiagram participant agent as docker-agent participant client as MCP client participant browser as Browser participant authServer as Authorization Server participant relay as Deeplink relay agent->>agent: generate state and PKCE, run DCR if needed agent->>agent: build authorize URL with configured redirect_uri agent->>agent: register state-to-waiter in registry agent->>client: elicitation with authorize_url and state client->>browser: open authorize_url browser->>authServer: GET authorize authServer-->>browser: redirect to configured redirect_uri with code and state alt Path A - out-of-band callback (new) browser->>relay: deeplink or URI-scheme handler relay->>agent: POST /api/mcp-oauth/callback with state and code agent->>agent: lookup state in registry, hand payload to waiter else Path B - elicitation reply browser->>client: deliver code and state client->>agent: ResumeElicitation with code and state end agent->>agent: verify state in constant time agent->>authServer: POST token with code, code_verifier, redirect_uri authServer-->>agent: access_token and refresh_token agent->>agent: store token plus client_id and secret for silent refreshWhat changes
1. Runtime-driven OAuth flow (server mode)
A new server-level flag —
--mcp-oauth-redirect-uri=<URL>(alsoMCPOAuthRedirectURIonRuntimeConfigandWithUnmanagedOAuthRedirectURIon the runtime). When set,handleUnmanagedOAuthFlow:state+ PKCE in-processclientId/clientSecretfromRemoteOAuthConfig)redirect_uriMetaincludes:docker-agent/authorize_url— URL the client should open in the browserdocker-agent/state— state value the client must echo backdocker-agent/server_url,auth_server,auth_server_metadata,resource_metadata{code, state}as aResumeElicitationpayload, verifiesstatein constant time, exchanges the code at the token endpoint (sameredirect_uriper RFC 6749 §4.1.3), and stores the resulting token in the keychain with client credentials stamped on for silent refreshWhen the flag is not set, the elicitation shape stays identical to today and the runtime expects the client-driven
{access_token, …}reply. CLI-mirror clients continue to work unchanged.The new path also accepts the client-driven
{access_token, …}reply, so a client that prefers to do the exchange itself is still free to.A per-toolset
RemoteOAuthConfig.CallbackRedirectURLoverrides the runtime-wide flag for that toolset.2. Out-of-band callback route —
POST /api/mcp-oauth/callbackAdds a second, session-less way to deliver the deeplink payload to the runtime:
The parameters mirror what an authorization server sends to a redirect URI per RFC 6749 §4.1.2. The body is empty; everything travels in the query string.
A process-wide pending-OAuth registry maps
state → waiting unmanaged flow.handleUnmanagedOAuthFlowregisters its waiter on entry (drive-flow only) and unregisters on exit. The new route looks up the waiter bystateand hands it the payload through a single-shot buffered channel. The waiting goroutine then performs the token exchange and persists the token exactly as before.The lookup-by-state IS the authentication: only states the runtime has just generated are present in the registry. Unknown or replayed states return 404. State values continue to come from
GenerateState(≥128 bits, opaque, base64).This route is strictly additive. The existing session-keyed
POST /api/sessions/:id/elicitationpath still works; embedders may use the elicitation path, the new callback path, or both.3. Propagate user-initiated cancellation across the
WithoutCancelboundaryclientConnector.Connectdetaches the caller's ctx withcontext.WithoutCancelso that the MCP toolset's session can outlive any single request (otherwise the SSE/stdio connection tears down when the request that triggeredtoolset.Startreturns, and a successful OAuth flow gets killed at request end).The detachment also severs user-initiated cancellation, which the OAuth select inside
Connectdoes need to observe — without that signal, embedders that abort the in-progress run while the user is in the OAuth dialog would leave the OAuth select blocked indefinitely, the per-session streaming lock held, and the next message returning409 Conflict / ErrSessionBusy.The connection-longevity invariant is preserved by stashing the caller's original ctx as a value on the detached one. The detached ctx remains the primary driver of the connection's lifetime; operations that want to observe user-initiated cancellation opt in by reading the parent back out and selecting on its
Donechannel.Carrying ctx-as-value is unusual but appropriate here: the parent is metadata about how the detached ctx was constructed, not data the request is operating on. The helpers (
withCancellableParent/cancellableParentFromContext) live next to the connector that creates the relationship so the rationale stays close to the code.