Skip to content

feat: OIDC provider#132

Open
alukach wants to merge 52 commits into
mainfrom
feat/oidc-provider
Open

feat: OIDC provider#132
alukach wants to merge 52 commits into
mainfrom
feat/oidc-provider

Conversation

@alukach

@alukach alukach commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

What I'm changing

Turns the data proxy into an OIDC provider so it can authenticate to the Source Cooperative API on behalf of each user, and adds an STS token-exchange endpoint so clients can trade an OIDC token for temporary S3-style credentials.

Highlights

  • OIDC provider — the proxy now mints and signs RSA JWTs and publishes them at GET /.well-known/openid-configuration and GET /.well-known/jwks.json, so the Source API (and other relying parties) can verify them. Supports zero-downtime key rotation by serving both the active and previous key in the JWKS.
  • Per-user API auth — replaces the single static api_secret with a flexible ApiAuth that signs a short-lived JWT scoped to the calling principal. Upstream API calls are now made as that user instead of with a shared secret.
  • STS token exchange — new /.sts endpoint with a _default role that trusts the auth provider, letting an authenticated user obtain temporary credentials.

Also included

  • New config module parses keys/secrets once per isolate (RSA PEM parsed a single time).
  • Object writes now return an S3 405 MethodNotAllowed instead of a misleading 404 (the proxy is read-only).
  • Upstream auth failures (401/403) now surface as S3 403 AccessDenied rather than a 500.
  • Cache keys are identity-scoped and URL-safe (hex-encoded subject).
  • Bumps multistore to 0.4.0, refreshes dependencies / cargo audit allowlist, and updates CI + deploy workflows to forward the new secrets.

Warning

this adds required env vars/secrets — OIDC_PROVIDER_KEY, OIDC_PROVIDER_KID, OIDC_PROVIDER_ISSUER, SESSION_TOKEN_KEY, AUTH_ISSUER. The worker will not boot without them. See the README → OIDC Provider for setup and the key-rotation procedure.

How to test it

  1. Access the accompanying frontend PR (feat: OIDC auth source.coop#283) preview: https://source-cooperative-git-feat-oidc-auth-radiantearth.vercel.app/

  2. Authenticate & access Restricted product (e.g. https://source-cooperative-git-feat-oidc-auth-radiantearth.vercel.app/alukach/alukach-experimentation). Verify ****

  3. Set the env vars/secrets above (the README has a one-liner to generate an RSA key).

  4. Discovery: curl https://<host>/.well-known/openid-configuration and /.well-known/jwks.json return the issuer and the active (+ previous, during rotation) keys.

  5. Token exchange: exchange an OIDC token at /.sts and confirm temporary credentials are returned.

  6. Behavior: a signed object download still streams; a PUT/POST/DELETE returns 405; a request the API rejects returns 403 AccessDenied (not 500).

PR Checklist

  • This PR has no breaking changes.
  • I have updated or added new tests to cover the changes in this PR.
  • This PR affects the Source Cooperative Frontend & API,
    and I have opened issue/PR #XXX to track the change.

Related Issues

alukach and others added 5 commits April 1, 2026 23:40
Add multistore-oidc-provider dependency and wire up OIDC configuration
loading from environment variables (OIDC_PROVIDER_KEY, OIDC_PROVIDER_KID,
OIDC_PROVIDER_ISSUER) with optional key rotation support. Serve
/.well-known/openid-configuration and /.well-known/jwks.json endpoints
when OIDC is configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce ApiAuth enum that supports OIDC JWT bearer tokens, static
shared secrets (legacy), or no authentication. When OIDC config is
present, the proxy signs a fresh JWT for each API request instead of
using a static secret. This enables service-to-service auth via the
OIDC provider added in previous commits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ng modules

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alukach alukach changed the title Feat/OIDC provider feat: OIDC provider Apr 2, 2026
@github-actions

github-actions Bot commented Apr 2, 2026

Copy link
Copy Markdown

🚀 Latest commit deployed to https://source-data-proxy-pr-132.source-coop.workers.dev

  • Date: 2026-06-10T21:05:54Z
  • Commit: 7c1ef84

The advisory covers a timing side-channel in RSA decryption. We only
use RSA for signing JWTs, never decryption, so the vulnerable code
path is never reached.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
alukach and others added 5 commits May 31, 2026 22:18
The gateway refactor dropped the write-method short-circuit, so writes
fell through to bucket resolution and returned 404. Restore the S3-style
405, exempting the special OIDC/STS endpoints which manage their own
methods. v0.4.0 removed the xml_response helper, so build it locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The subject was interpolated raw into the cache-key URL. Principal names
containing spaces, '&', '#', or non-ASCII could corrupt the key or
collide distinct subjects. Hex-encode the subject: injective and always
URL-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upstream API 401/403 responses were mapped to ProxyError::Internal and
surfaced to clients as HTTP 500 InternalError. Classify them as
ProxyError::AccessDenied so an auth rejection becomes a proper S3 403
AccessDenied, and render the AccountListHandler error arm via
ErrorResponse::from_proxy_error + ProxyError::status_code so the error
variant drives both the S3 code and HTTP status (no hardcoded strings).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the hand-rolled xml_response helper and render the read-only 405
response through GatewayResponse::Response(ProxyResult::xml(..)).into_web_sys(),
reusing multistore's single response-rendering path. The error body still
uses multistore's ErrorResponse type.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@tylere

tylere commented Jun 9, 2026

Copy link
Copy Markdown

Comments generated by Cursor / Fable 5 analysis:

I reviewed PR #132 "feat: OIDC provider" (alukach, +1,263/−1,085 across 20 files, all 13 checks green, no reviews yet). It's the proxy half of the pair with source.coop#283: the worker becomes its own OIDC issuer (discovery + JWKS, RSA-signed JWTs), replaces the shared SOURCE_API_SECRET with per-user short-lived JWTs, and adds a /.sts token-exchange endpoint.

Overall

The architecture is sound and well-documented, and the code quality is high — identity-scoped cache keys, proper S3 error mapping, key rotation with dual JWKS, and unusually thorough justifications in audit.toml. I'd want the items below addressed or answered before merge, though — #1 and #2 in particular.

Security findings

1. The _default STS role doesn't restrict token audience (high).

In src/sts.rs:

trusted_oidc_issuers: vec![oidc_issuer],
required_audience: None,
subject_conditions: vec![],
allowed_scopes: vec![], // unlimited
Any ID token issued by AUTH_ISSUER — for any OAuth client registered with Ory, not just the source.coop server flow — can be exchanged at /.sts for credentials acting as that subject. If a third-party app ever gets registered as an Ory client, an ID token a user grants to that app becomes exchangeable for the user's data-proxy credentials (classic audience-confusion). Recommend setting required_audience to the client_id used by the paired source.coop flow, or explicitly documenting why it's deferred.

2. Key-rotation runbook conflicts with the deploy workflow (medium).

deploy.yml now re-uploads OIDC_PROVIDER_KEY and SESSION_TOKEN_KEY from GitHub secrets via wrangler secret bulk on every deploy. The README rotation procedure says to rotate with wrangler secret put — if someone follows it without also updating the GitHub secret, the next deploy silently reverts to the old key. Also inconsistent: OIDC_PROVIDER_KEY_PREVIOUS is not in the bulk upload, so the previous key is managed out-of-band while the active key is CI-managed. The README should make GitHub secrets the source of truth.

3. CORS now advertises write methods to all origins.

access-control-allow-methods grew PUT, POST, DELETE with allow-origin: * and allow-headers: *. The proxy is read-only (writes 405), so presumably this is for POST to /.sts — but it's applied globally. Worth narrowing, or noting why.

4. SESSION_TOKEN_KEY rotation isn't documented.

Rotating it invalidates all outstanding sealed STS credentials (the frontend's sc_proxy_creds cookies). Fine operationally, but the README rotation section only covers the OIDC key.

Correctness / design

7. Restricted reads still hit the backend anonymously.

registry.rs keeps skip_signature = "true" for all requests (the TODO acknowledges it). So "restricted" is enforced only at the API-metadata layer; the backing bucket must still be anonymously readable by the proxy. That seems to be the intended interim state (follow-up exists in PR #147 / draft #332), but it's worth stating in the PR description since "restricted product support" is the headline of the paired PR.

8. /.sts path match is exact.

parts.path == "/.sts" — /.sts/ or /.sts? variants fall through to bucket mapping and would 404/405 confusingly. Minor; consider normalizing.

9. Analytics pollution from special paths.

Location broadcast skips parts.path.starts_with("/."), but log_analytics doesn't — .well-known and .sts requests get logged with account = ".well-known". Minor data-quality issue.

10. cache_key_with_subject appends ?subject= — breaks if api_url ever carries a query string.

True today that none do; a debug assert or comment would prevent a future foot-gun. The hex-encoding rationale itself is good.

11. 60s JWT TTL is tight.

JwtSigner::from_pem(&pem, kid, 60) — if there's clock skew between the Cloudflare edge and the source.coop API and the verifier enforces iat/nbf strictly, requests will intermittently 403. Worth confirming the API verifier allows leeway.

14. load_config panics on missing env.

Every expect turns a misconfigured worker into per-request panics/500s rather than a failed boot. The deploy-and-smoke-test workflow mitigates this, and the PR warns about it, so this is acceptable — just be aware preview/staging/prod all need the secrets before this merges (deploy will fail otherwise).

CI nits

  • sed -i 's/^command = .*/command = "echo skip"/' wrangler.toml rewrites all command = lines; there's only one today, but it's fragile.
  • The SECRETS='{"OIDC_PROVIDER_KEY": ${{ toJSON(...) }}...' single-quoted shell interpolation breaks if a secret ever contains a single quote. PEM/base64 values won't, but printf from a file or a heredoc would be more robust.
  • Nice improvement: reusing the build artifact for integration tests and replacing the poll loop with curl --retry --retry-connrefused.

What I'd merge as-is

The error-mapping change (401/403 → S3 AccessDenied instead of 500, writes → 405 instead of 404), the OnceLock config/JWKS caching (RSA PEM parsed once per isolate), the removal of the spawn_local+oneshot bridge in registry.rs, and the DataMode → Visibility rename are all clean improvements independent of the headline feature.

alukach and others added 8 commits June 9, 2026 13:58
Without an audience check, an ID token issued by AUTH_ISSUER for ANY
registered OAuth client — not just the paired source.coop flow — could
be exchanged at /.sts for credentials acting as that subject (classic
audience confusion). Add an optional AUTH_AUDIENCE var that pins the
_default role's required_audience to the expected client_id, and warn
at startup when it is unset.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CORS previously announced PUT, POST, DELETE with allow-origin: * on
every response, but the proxy is read-only everywhere except the STS
endpoint. Advertise GET, HEAD, OPTIONS globally and add POST only on
/.sts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The router matches /.sts exactly, so /.sts/ fell through to bucket
mapping and produced a confusing 404/405. Normalize the trailing-slash
variant before routing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The location broadcast already skipped paths starting with "/." but
analytics did not, so /.well-known and /.sts requests were recorded
with account = ".well-known", polluting the per-product dataset.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cache_key_with_subject appends "?subject=", which silently produces a
malformed key if the api_url ever grows its own query string. True for
none of the current call sites — guard against the future foot-gun.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The bulk secret upload managed the active OIDC key via GitHub secrets
but left the previous key out-of-band, so a rotation done with
`wrangler secret put` was silently reverted on the next deploy. Pass
OIDC_PROVIDER_KEY_PREVIOUS through as an optional secret, included in
the upload only while a rotation is in progress.

Also build the payload with jq from env vars instead of single-quoted
shell interpolation, which would break on a secret containing a quote.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The integration job's sed rewrote every `command =` line in
wrangler.toml; anchor it to the [build] block so future command keys
elsewhere in the config are untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The runbook said to rotate with `wrangler secret put`, but the deploy
workflow re-uploads OIDC_PROVIDER_KEY and SESSION_TOKEN_KEY from GitHub
secrets on every deploy, silently reverting any out-of-band rotation.
Make GitHub environment secrets the documented source of truth, and
document SESSION_TOKEN_KEY (including that rotating it invalidates all
outstanding sealed STS credentials).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
alukach and others added 3 commits June 10, 2026 13:50
Without an audience restriction, /.sts accepted an ID token minted for
ANY OAuth client of AUTH_ISSUER and exchanged it for that user's
credentials — a confused-deputy cross-client impersonation. AUTH_AUDIENCE
is currently configured in no environment, so the endpoint was live and
unrestricted.

Fail closed: only mount /.sts when an audience restriction is configured,
and short-circuit it with a 501 NotImplemented otherwise, so an
unrestricted exchanger is never registered.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
account/product path segments are percent-decoded before being split out
of the request path, then interpolated raw into the upstream API URL. A
decoded `?`/`#`/`&` therefore injected query parameters into the
secret-authenticated Source API call, and — combined with the
?subject=<hex> cache key — let an anonymous request forge a key colliding
with an authenticated user's, poisoning their product resolution.

Percent-encode account/product with a path-segment set (unreserved chars
pass through unchanged, so ordinary slugs are byte-identical). Also drop
the release-compiled-out debug_assert in cache_key_with_subject for a
real separator choice, so a query-bearing URL can never silently forge a
colliding key.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
An unset GitHub secret evaluated to empty, jq emitted an empty value,
`secret bulk` succeeded, and the job went green — while every request to
the worker panicked in load_config. Fail loudly if OIDC_PROVIDER_KEY or
SESSION_TOKEN_KEY is empty, and add a post-deploy smoke test that curls
/ and /.well-known/jwks.json so a worker that can't boot turns the job
red instead of shipping silently.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants