fix(sandbox): require bearer-token auth on sandbox API (fixes #78) by pratyush618 · Pull Request #85 · huggingface/ml-intern

pratyush618 · 2026-04-23T18:25:39Z

Summary

Fixes #78 — the sandbox's embedded FastAPI server exposed command-execution and file I/O endpoints (/api/bash, /api/read, /api/write, /api/edit, /api/kill, /api/exists, /api/health) without validating the Authorization header the client already sends. Combined with Sandbox.create(..., private=False) as the default, any unauthenticated caller who could reach https://<slug>.hf.space/ could execute arbitrary commands in a user's sandbox and printenv HF_TOKEN to pivot to the user's HF account.

Three small, reviewable commits:

fix(sandbox): require bearer-token auth on embedded FastAPI routes — app-wide FastAPI dependency reads HF_TOKEN at startup and validates the Bearer token with hmac.compare_digest. Fails closed when HF_TOKEN is unset.
test(sandbox): cover bearer-token auth on embedded server — 6 pytest cases against the _SANDBOX_SERVER source loaded via importlib. Covers missing / wrong / correct bearer, /api/bash never reaching subprocess.Popen without auth, /api/write spot-check, and the fail-closed path.
fix(sandbox): default new sandboxes to private — defense in depth: make Sandbox.create(..., private=True) the default so the HF Space gateway also gates access. Client already sends the user's HF_TOKEN as Bearer, so existing flows keep working.

No client-side header change was needed: Sandbox.__post_init__ already sent Authorization: Bearer {self.token} where self.token is the user's HF_TOKEN, which is exactly what the new server dependency validates against the HF_TOKEN secret already injected by sandbox_tool.py.

Behavior change

Sandbox.create(...) now defaults to private=True. Callers that relied on the old public-by-default behavior must now pass private=False explicitly. The agent-facing sandbox_create tool is unaffected in shape: its private parameter already defaults to True via this change, and the tool only forwards private when the caller passes it. There is no CHANGELOG file in the repo; flagging here for release notes.

Test plan

uv run --with pytest --with pytest-asyncio pytest tests/ -q → 17 passed (6 new + 11 existing).
Static check: compile(_SANDBOX_SERVER, "sandbox_server.py", "exec") — the embedded server source is valid Python.
Manual e2e: sandbox_create via the agent, confirm the Space is private, then curl -X POST https://<slug>.hf.space/api/bash -d '{"command":"id"}' without a Bearer token → expect 401 (or HF gateway 401/redirect). With the correct Bearer → expect 200.
Regression: agent can still bash, read, write, edit inside the sandbox after the change.

The sandbox's embedded server exposed /api/bash, /api/read, /api/write, /api/edit, /api/kill, /api/exists, /api/health without validating the Authorization header the client already sends. Anyone who could reach the public *.hf.space URL could execute arbitrary commands inside the user's sandbox — and read HF_TOKEN from the environment. Add an app-wide FastAPI dependency that reads HF_TOKEN at startup and matches it against the Bearer token on every request using hmac.compare_digest. Fail closed: if HF_TOKEN is unset, every request 401s rather than silently becoming open. Refs huggingface#78

Six cases against the _SANDBOX_SERVER source loaded via importlib: - missing Authorization header → 401 - wrong Bearer token → 401 - correct Bearer token → 200 - /api/bash with wrong token must 401 before subprocess.Popen is called - fail-closed when HF_TOKEN is unset → every request 401s - /api/write spot-check to confirm the app-wide dependency covers POST routes and the target file is never written

Defense in depth alongside the server-side bearer-token check: make Sandbox.create(..., private=True) the default so the HF Space platform itself also gates access to the *.hf.space URL. The client already sends the user's HF_TOKEN in the Bearer header, which is what HF's gateway validates for private-space access, so existing flows stay working. sandbox_create_handler only forwards 'private' when the caller explicitly passes it, so this default propagates cleanly without any handler change.

pratyush618 · 2026-04-24T05:23:43Z

Thanks for the review @statikkkkk.

On the two concrete points:

Behavior change docs — no CHANGELOG file exists in the repo, so I've added a Behavior change section to the PR description calling out that Sandbox.create(...) now defaults to private=True. Happy to relocate the note if maintainers track user-visible changes somewhere I missed.
Manual e2e checks — those require a live (billed) HF Space and the agent's OAuth session, so I can't tick them from the fork. I've left them unchecked in the test plan so a maintainer (or I, with guidance) can verify against a real sandbox before merge. The automated unit tests exercise the auth paths end-to-end against the embedded server source, which should at least catch regressions on every CI run.

akseljoonas · 2026-04-24T14:08:05Z

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

claude · 2026-04-24T14:08:23Z

Claude finished @akseljoonas's task in 2m 0s —— View job

PR Review: fix(sandbox): require bearer-token auth on sandbox API

Read CLAUDE.md and repo context
Diff against origin/main
Review sandbox_client.py changes
Review test_sandbox_server_auth.py
Check for LiteLLM/Bedrock routing impact
Security analysis
Post final review

Verdict: APPROVE with one important fix noted. The core security fix is correct and well-implemented. One issue could cause a silent hang on mis-auth; the rest are minor.

Security — ✅ Correct

sandbox_client.py lines 157–169 (_SANDBOX_SERVER embedded source):

hmac.compare_digest is the right tool — no timing oracle.
FastAPI(dependencies=[Depends(require_auth)]) correctly applies the guard app-wide. New routes added in the future cannot accidentally be left open.
Fail-closed when HF_TOKEN is unset (if not _AUTH_TOKEN: raise 401) is the right default. The empty-string check on _AUTH_TOKEN also handles the "" sentinel correctly.
Token injection order in Sandbox.create() is correct: secrets are pushed before _setup_server() uploads the server files (which triggers the image rebuild), so HF_TOKEN is in the environment when the server process first starts and evaluates _AUTH_TOKEN = os.environ.get("HF_TOKEN", "").

Bug: `_wait_for_api` silently loops on 401

sandbox_client.py lines 676–691:

while time.time() < deadline:
    try:
        resp = self._client.get("health", timeout=10)
        last_status = resp.status_code
        if resp.status_code == 200:
            return
    except Exception as e:
        last_err = e
    time.sleep(3)

Before this PR, a reachable-but-not-yet-ready server returned non-200 and the loop kept polling — sensible. After this PR, a correctly-running server with a wrong token also returns 401 (non-200), so the loop will silently spin for the full API_WAIT_TIMEOUT (180 s default) and then raise TimeoutError: Sandbox API … not responding — which is misleading when the real problem is auth.

Suggested fix — break early on 401/403:

if resp.status_code in (401, 403):
    raise RuntimeError(
        f"Sandbox API rejected auth (HTTP {resp.status_code}). "
        "Check that HF_TOKEN is set as a Space secret."
    )

This won't affect the common path (token is always correct when auto-created via _ensure_sandbox), but it makes diagnosing misconfigured sandboxes much faster. Fix this →

Minor: tool spec description doesn't reflect new default

sandbox_tool.py line 197:

"description": "If true, create a private Space",

Now that private=True is the default, this description is misleading — it implies false is the default. Should read something like "Whether the Space is private (default: true)". Low urgency, but a LLM consuming this schema will infer the wrong default. Fix this →

LiteLLM / Bedrock routing — ✅ No impact

These changes are entirely within the HF Space sandbox layer (sandbox_client.py + sandbox_server.py). The LiteLLM layer (agent/main.py, agent/core/llm_params.py) routes model calls and is completely separate. No routing, credential, or model-dispatch code is touched.

Tests — ✅ Good coverage

Six cases are well-chosen:

Missing header → 401
Wrong token → 401
Correct token → 200
/api/bash never reaches subprocess.Popen without valid auth
Fail-closed when HF_TOKEN="" (also catches the empty-Bearer-string edge case)
/api/write spot-check confirms the app-wide dependency covers POST routes

One gap worth noting: there's no positive test asserting that /api/bash actually executes and returns a result when the correct token is present. Not a blocker, but worth adding eventually.

Summary

Area	Status
Auth logic (`hmac.compare_digest`, fail-closed)	✅ Correct
Scope (app-wide FastAPI dependency)	✅ Correct
Token injection timing	✅ Correct
`private=True` defense-in-depth	✅ Correct
`_wait_for_api` on 401	⚠️ Silent hang — should short-circuit
Tool spec description	📝 Stale description, minor
LiteLLM/Bedrock impact	✅ None
Tests	✅ Good, one gap

Safe to merge after addressing the _wait_for_api 401 hang (or accepting it as a known rough edge).

Before this change, a reachable sandbox server that returned 401/403 on /api/health (e.g. because HF_TOKEN was not set as a Space secret, or the client and server tokens don't match) would cause the health-poll loop to spin for the full API_WAIT_TIMEOUT and then raise "not responding" — a misleading error when the real cause is auth. Short-circuit on 401/403 with a RuntimeError that points at the likely misconfiguration. TimeoutError still fires for genuine unreachability.

The 'private' parameter description ("If true, create a private Space") implied false was the default. Since Sandbox.create now defaults to private=True, the schema an LLM sees should match. Update the description to state the true default and how to opt out.

The auth tests so far only cover the rejection paths. Add a matching positive case that confirms /api/bash actually runs a command and returns its stdout when the Bearer token is correct, so a regression that broke the happy path (e.g. require_auth accidentally rejecting valid tokens) would show up as a test failure instead of only manifesting at runtime.

pratyush618 · 2026-04-24T14:18:01Z

Addressed the review in three follow-up commits on the same branch:

fix(sandbox): fail fast in _wait_for_api when auth is rejected (909c450) — _wait_for_api now short-circuits on 401/403 with a RuntimeError that names the likely misconfiguration, instead of looping silently until API_WAIT_TIMEOUT fires a misleading "not responding".
docs(sandbox): reflect private=true default in sandbox_create tool spec (24c5650) — updates the private parameter description in SANDBOX_CREATE_TOOL_SPEC so an LLM consuming the schema sees the new default.
test(sandbox): add positive-path bash-with-valid-auth case (2f3bcaa) — closes the test gap: confirms /api/bash actually runs and returns stdout when the Bearer token is correct. Full suite: 18 passed.

pratyush618 added 3 commits April 23, 2026 23:54

pratyush618 added 3 commits April 24, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): require bearer-token auth on sandbox API (fixes #78)#85

fix(sandbox): require bearer-token auth on sandbox API (fixes #78)#85
pratyush618 wants to merge 6 commits intohuggingface:mainfrom
pratyush618:fix/issue-78-sandbox-auth

pratyush618 commented Apr 23, 2026 •

edited

Loading

Uh oh!

pratyush618 commented Apr 24, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

pratyush618 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pratyush618 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior change

Test plan

Uh oh!

pratyush618 commented Apr 24, 2026

Uh oh!

akseljoonas commented Apr 24, 2026

Uh oh!

claude Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: fix(sandbox): require bearer-token auth on sandbox API

Security — ✅ Correct

Bug: _wait_for_api silently loops on 401

Minor: tool spec description doesn't reflect new default

LiteLLM / Bedrock routing — ✅ No impact

Tests — ✅ Good coverage

Summary

Uh oh!

pratyush618 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pratyush618 commented Apr 23, 2026 •

edited

Loading

claude Bot commented Apr 24, 2026 •

edited

Loading

Bug: `_wait_for_api` silently loops on 401