Skip to content

fix(sandbox): require bearer-token auth on sandbox API (fixes #78)#85

Open
pratyush618 wants to merge 6 commits intohuggingface:mainfrom
pratyush618:fix/issue-78-sandbox-auth
Open

fix(sandbox): require bearer-token auth on sandbox API (fixes #78)#85
pratyush618 wants to merge 6 commits intohuggingface:mainfrom
pratyush618:fix/issue-78-sandbox-auth

Conversation

@pratyush618
Copy link
Copy Markdown

@pratyush618 pratyush618 commented Apr 23, 2026

Summary

Fixes #78 — the sandbox's embedded FastAPI server exposed command-execution and file I/O endpoints (/api/bash, /api/read, /api/write, /api/edit, /api/kill, /api/exists, /api/health) without validating the Authorization header the client already sends. Combined with Sandbox.create(..., private=False) as the default, any unauthenticated caller who could reach https://<slug>.hf.space/ could execute arbitrary commands in a user's sandbox and printenv HF_TOKEN to pivot to the user's HF account.

Three small, reviewable commits:

  1. fix(sandbox): require bearer-token auth on embedded FastAPI routes — app-wide FastAPI dependency reads HF_TOKEN at startup and validates the Bearer token with hmac.compare_digest. Fails closed when HF_TOKEN is unset.
  2. test(sandbox): cover bearer-token auth on embedded server — 6 pytest cases against the _SANDBOX_SERVER source loaded via importlib. Covers missing / wrong / correct bearer, /api/bash never reaching subprocess.Popen without auth, /api/write spot-check, and the fail-closed path.
  3. fix(sandbox): default new sandboxes to private — defense in depth: make Sandbox.create(..., private=True) the default so the HF Space gateway also gates access. Client already sends the user's HF_TOKEN as Bearer, so existing flows keep working.

No client-side header change was needed: Sandbox.__post_init__ already sent Authorization: Bearer {self.token} where self.token is the user's HF_TOKEN, which is exactly what the new server dependency validates against the HF_TOKEN secret already injected by sandbox_tool.py.

Behavior change

Sandbox.create(...) now defaults to private=True. Callers that relied on the old public-by-default behavior must now pass private=False explicitly. The agent-facing sandbox_create tool is unaffected in shape: its private parameter already defaults to True via this change, and the tool only forwards private when the caller passes it. There is no CHANGELOG file in the repo; flagging here for release notes.

Test plan

  • uv run --with pytest --with pytest-asyncio pytest tests/ -q17 passed (6 new + 11 existing).
  • Static check: compile(_SANDBOX_SERVER, "sandbox_server.py", "exec") — the embedded server source is valid Python.
  • Manual e2e: sandbox_create via the agent, confirm the Space is private, then curl -X POST https://<slug>.hf.space/api/bash -d '{"command":"id"}' without a Bearer token → expect 401 (or HF gateway 401/redirect). With the correct Bearer → expect 200.
  • Regression: agent can still bash, read, write, edit inside the sandbox after the change.

The sandbox's embedded server exposed /api/bash, /api/read, /api/write,
/api/edit, /api/kill, /api/exists, /api/health without validating the
Authorization header the client already sends. Anyone who could reach the
public *.hf.space URL could execute arbitrary commands inside the user's
sandbox — and read HF_TOKEN from the environment.

Add an app-wide FastAPI dependency that reads HF_TOKEN at startup and
matches it against the Bearer token on every request using
hmac.compare_digest. Fail closed: if HF_TOKEN is unset, every request 401s
rather than silently becoming open.

Refs huggingface#78
Six cases against the _SANDBOX_SERVER source loaded via importlib:
- missing Authorization header → 401
- wrong Bearer token → 401
- correct Bearer token → 200
- /api/bash with wrong token must 401 before subprocess.Popen is called
- fail-closed when HF_TOKEN is unset → every request 401s
- /api/write spot-check to confirm the app-wide dependency covers POST
  routes and the target file is never written
Defense in depth alongside the server-side bearer-token check: make
Sandbox.create(..., private=True) the default so the HF Space platform
itself also gates access to the *.hf.space URL. The client already sends
the user's HF_TOKEN in the Bearer header, which is what HF's gateway
validates for private-space access, so existing flows stay working.

sandbox_create_handler only forwards 'private' when the caller explicitly
passes it, so this default propagates cleanly without any handler change.
@pratyush618
Copy link
Copy Markdown
Author

Thanks for the review @statikkkkk.

On the two concrete points:

  • Behavior change docs — no CHANGELOG file exists in the repo, so I've added a Behavior change section to the PR description calling out that Sandbox.create(...) now defaults to private=True. Happy to relocate the note if maintainers track user-visible changes somewhere I missed.
  • Manual e2e checks — those require a live (billed) HF Space and the agent's OAuth session, so I can't tick them from the fork. I've left them unchecked in the test plan so a maintainer (or I, with guidance) can verify against a real sandbox before merge. The automated unit tests exercise the auth paths end-to-end against the embedded server source, which should at least catch regressions on every CI run.

@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 2m 0s —— View job


PR Review: fix(sandbox): require bearer-token auth on sandbox API

  • Read CLAUDE.md and repo context
  • Diff against origin/main
  • Review sandbox_client.py changes
  • Review test_sandbox_server_auth.py
  • Check for LiteLLM/Bedrock routing impact
  • Security analysis
  • Post final review

Verdict: APPROVE with one important fix noted. The core security fix is correct and well-implemented. One issue could cause a silent hang on mis-auth; the rest are minor.


Security — ✅ Correct

sandbox_client.py lines 157–169 (_SANDBOX_SERVER embedded source):

  • hmac.compare_digest is the right tool — no timing oracle.
  • FastAPI(dependencies=[Depends(require_auth)]) correctly applies the guard app-wide. New routes added in the future cannot accidentally be left open.
  • Fail-closed when HF_TOKEN is unset (if not _AUTH_TOKEN: raise 401) is the right default. The empty-string check on _AUTH_TOKEN also handles the "" sentinel correctly.
  • Token injection order in Sandbox.create() is correct: secrets are pushed before _setup_server() uploads the server files (which triggers the image rebuild), so HF_TOKEN is in the environment when the server process first starts and evaluates _AUTH_TOKEN = os.environ.get("HF_TOKEN", "").

Bug: _wait_for_api silently loops on 401

sandbox_client.py lines 676–691:

while time.time() < deadline:
    try:
        resp = self._client.get("health", timeout=10)
        last_status = resp.status_code
        if resp.status_code == 200:
            return
    except Exception as e:
        last_err = e
    time.sleep(3)

Before this PR, a reachable-but-not-yet-ready server returned non-200 and the loop kept polling — sensible. After this PR, a correctly-running server with a wrong token also returns 401 (non-200), so the loop will silently spin for the full API_WAIT_TIMEOUT (180 s default) and then raise TimeoutError: Sandbox API … not responding — which is misleading when the real problem is auth.

Suggested fix — break early on 401/403:

if resp.status_code in (401, 403):
    raise RuntimeError(
        f"Sandbox API rejected auth (HTTP {resp.status_code}). "
        "Check that HF_TOKEN is set as a Space secret."
    )

This won't affect the common path (token is always correct when auto-created via _ensure_sandbox), but it makes diagnosing misconfigured sandboxes much faster. Fix this →


Minor: tool spec description doesn't reflect new default

sandbox_tool.py line 197:

"description": "If true, create a private Space",

Now that private=True is the default, this description is misleading — it implies false is the default. Should read something like "Whether the Space is private (default: true)". Low urgency, but a LLM consuming this schema will infer the wrong default. Fix this →


LiteLLM / Bedrock routing — ✅ No impact

These changes are entirely within the HF Space sandbox layer (sandbox_client.py + sandbox_server.py). The LiteLLM layer (agent/main.py, agent/core/llm_params.py) routes model calls and is completely separate. No routing, credential, or model-dispatch code is touched.


Tests — ✅ Good coverage

Six cases are well-chosen:

  • Missing header → 401
  • Wrong token → 401
  • Correct token → 200
  • /api/bash never reaches subprocess.Popen without valid auth
  • Fail-closed when HF_TOKEN="" (also catches the empty-Bearer-string edge case)
  • /api/write spot-check confirms the app-wide dependency covers POST routes

One gap worth noting: there's no positive test asserting that /api/bash actually executes and returns a result when the correct token is present. Not a blocker, but worth adding eventually.


Summary

Area Status
Auth logic (hmac.compare_digest, fail-closed) ✅ Correct
Scope (app-wide FastAPI dependency) ✅ Correct
Token injection timing ✅ Correct
private=True defense-in-depth ✅ Correct
_wait_for_api on 401 ⚠️ Silent hang — should short-circuit
Tool spec description 📝 Stale description, minor
LiteLLM/Bedrock impact ✅ None
Tests ✅ Good, one gap

Safe to merge after addressing the _wait_for_api 401 hang (or accepting it as a known rough edge).

Before this change, a reachable sandbox server that returned 401/403 on
/api/health (e.g. because HF_TOKEN was not set as a Space secret, or the
client and server tokens don't match) would cause the health-poll loop to
spin for the full API_WAIT_TIMEOUT and then raise "not responding" — a
misleading error when the real cause is auth.

Short-circuit on 401/403 with a RuntimeError that points at the likely
misconfiguration. TimeoutError still fires for genuine unreachability.
The 'private' parameter description ("If true, create a private Space")
implied false was the default. Since Sandbox.create now defaults to
private=True, the schema an LLM sees should match. Update the description
to state the true default and how to opt out.
The auth tests so far only cover the rejection paths. Add a matching
positive case that confirms /api/bash actually runs a command and returns
its stdout when the Bearer token is correct, so a regression that broke
the happy path (e.g. require_auth accidentally rejecting valid tokens)
would show up as a test failure instead of only manifesting at runtime.
@pratyush618
Copy link
Copy Markdown
Author

Addressed the review in three follow-up commits on the same branch:

  • fix(sandbox): fail fast in _wait_for_api when auth is rejected (909c450) — _wait_for_api now short-circuits on 401/403 with a RuntimeError that names the likely misconfiguration, instead of looping silently until API_WAIT_TIMEOUT fires a misleading "not responding".
  • docs(sandbox): reflect private=true default in sandbox_create tool spec (24c5650) — updates the private parameter description in SANDBOX_CREATE_TOOL_SPEC so an LLM consuming the schema sees the new default.
  • test(sandbox): add positive-path bash-with-valid-auth case (2f3bcaa) — closes the test gap: confirms /api/bash actually runs and returns stdout when the Bearer token is correct. Full suite: 18 passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security issue: public sandbox API appears to allow unauthenticated command execution

2 participants