fix(sandbox): rewrite credential placeholders in websocket text frames#1286
fix(sandbox): rewrite credential placeholders in websocket text frames#1286ericksoa wants to merge 14 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
All contributors have signed the DCO ✍️ ✅ |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
WebSocket implementation alignment noteThis PR follows common hardening patterns used by established WebSocket clients, servers, and proxy implementations, while keeping the implementation tailored to OpenShell's credential-boundary model. The important alignment points are behavioral:
This matters because OpenShell is not trying to become a general WebSocket application framework. The purpose here is narrower: preserve the no-secret-in-sandbox invariant, then resolve placeholders only at the OpenShell relay boundary for opted-in client-to-server text frames. The safest shape is therefore a small, explicit protocol surface: if the relay can validate and process the negotiated WebSocket state, it proceeds; if not, it closes before forwarding an unsafe upgraded flow. The remaining confidence-building work is conformance and durability, not feature expansion. The follow-up path is to keep the in-tree RFC regression matrix, maintain the Docker-backed WebSocket conformance e2e lane, and consider an Autobahn-style relay harness or parser fuzzing after this PR lands. |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Follow-up hardening passThis pass applies common WebSocket proxy hardening patterns to the OpenShell relay: parse negotiation as structured protocol data, validate server-selected options against the original client offer, canonicalize only the supported safe extension forms, and fail closed when the rewrite path sees parameters it does not explicitly support. Concretely, This matters for credential rewriting because the proxy becomes a protocol participant after it mutates an upgraded flow. Loose negotiation can let the upstream select compression state or subprotocol behavior the relay did not offer or understand, which creates room for state confusion, fail-open compression handling, or secret-bearing payloads moving through a path that no longer has the invariants the rewriter depends on. The new behavior keeps the supported surface narrow and deterministic: either the relay can validate and safely process the negotiated shape, or the handshake fails before forwarding the 101 to the client. Remaining conformance work is still useful, especially an Autobahn-style relay harness and broader fuzz/property coverage for handshake parsing and frame sequencing. Those should come after this focused spec pass so future parser or engine swaps have a stable regression matrix to preserve. |
|
I have read the DCO document and I hereby sign the DCO. |
|
I have read the Contributor Agreement including DCO and I hereby sign the Contributor Agreement and DCO |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Focused conformance matrix addedI added
This is not a replacement for Autobahn. It is the closest low-cost regression layer for this PR because it covers the specific OpenShell behavior boundary: once the proxy becomes an active WebSocket participant for credential rewriting, the handshake assumptions, compression negotiation, and post-upgrade frame relay need to be tested together rather than only as isolated parser/frame unit tests. Full Autobahn remains useful as a follow-up CI or manual conformance job, but it would add Docker/runtime/tooling weight beyond this focused PR pass. |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Progress update: policy configuration now treats WebSocket as a first-class inspected protocol in the incremental update flow. What changed:
Why it matters: Validation:
|
…dential-rewrite Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Progress update: WebSocket credential replacement is now configurable from the incremental policy workflow and covered through the route-selected relay path. What changed:
Why it matters: Validation:
|
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Follow-up update after the policy CLI shape review:
openshell policy update demo \
--add-endpoint realtime.example.com:443:read-write:websocket:enforce:websocket-credential-rewrite \
--binary /usr/bin/nodeRationale: Message-content policy would buy a different class of control than this PR currently implements: it would let OpenShell authorize WebSocket messages by application intent inside the text payload, such as event type, channel, action, model/tool name, or service-specific JSON shape, instead of only by host/port, upgrade path, and Validation after this update:
All passed locally. |
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Progress update: GraphQL-over-WebSocket semantic policy pass Implemented in
Validation run:
Maintainer decisions still explicit:
|
…dential-rewrite Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Summary
Fixes #872 by adding an opt-in, policy-controlled WebSocket credential placeholder rewrite path after an allowed RFC 6455 upgrade. Provider secrets stay out of sandbox env, argv, config files, logs, and sandbox-generated payloads until OpenShell reaches the relay boundary; opted-in endpoints then resolve
openshell:resolve:env:*placeholders in client-to-server UTF-8 WebSocket text frames before forwarding upstream.The implementation follows common WebSocket proxy hardening patterns while keeping the scope specific to OpenShell: validate the upgrade, parse frames strictly, support only a narrow safe compression shape, fail closed on ambiguous protocol states, preserve raw opt-out behavior, and use structured GraphQL operation policy for GraphQL-over-WebSocket instead of generic payload matching.
Related Issue
Fixes #872
Reviewer Map
proto/sandbox.proto,crates/openshell-policy,crates/openshell-cli,crates/openshell-providers,crates/openshell-server, and policy docs.crates/openshell-sandbox/src/l7/websocket.rs,relay.rs,rest.rs,proxy.rs, andsecrets.rs.crates/openshell-sandbox/src/l7/graphql.rs,websocket.rs,mod.rs,opa.rs, andsandbox-policy.rego.crates/openshell-sandboxtests,e2e/rust/tests/websocket_conformance.rs,tasks/test.toml, and.github/workflows/websocket-conformance.yml.Changes
websocket_credential_rewriteas an explicit endpoint policy field, defaulting tofalse.protocol: websocketendpoints and REST compatibility endpoints that perform a WebSocket upgrade.GET/WEBSOCKET_TEXTrules, and thewebsocket-credential-rewriteendpoint option.101 Switching Protocols, including status,Upgrade,Connection,Sec-WebSocket-Accept, and extension negotiation against the sanitized client offer.permessage-deflatesubset with no-context-takeover constraints; unsupported parameters or mismatched upstream responses fail closed.openshell:resolve:env:*placeholders. Binary frames and server-to-client frames are relayed without rewriting.graphql-transport-wssubscribemessages and legacygraphql-ws/ subscriptions-transport-wsstartmessages.connection_init,connection_terminate,ping,pong,complete, andstopout of payload logs.Security and Spec Notes
subscriptionis supported as an explicit GraphQL operation type. It is not silently added toread-onlyorread-write; that matches the existing GraphQL preset model whereread-only = query,read-write = query + mutation, andfull = *.Testing
mise run pre-commitpassesFocused validation run on the current pushed head:
cargo fmtcargo test -p openshell-sandbox websocketcargo test -p openshell-sandbox graphqlcargo test -p openshell-sandbox secretscargo test -p openshell-policycargo clippy -p openshell-sandbox -p openshell-policy --all-targets -- -D warningsmise run pre-commitAutobahn was not run because the repo does not currently include an Autobahn harness. This PR adds the closest practical focused regression coverage in-tree plus a dedicated WebSocket conformance e2e lane; the workflow is manual today with a note to add
schedule:after it burns in.Remaining Maintainer Decisions
.github/workflows/websocket-conformance.ymlinto scheduled nightly coverage after manual burn-in.access: fullis used.Checklist