Skip to content

Draft: Add new providers via adapters#60

Draft
andrejvysny wants to merge 19 commits intohuggingface:mainfrom
andrejvysny:provider-adapters-phase-2
Draft

Draft: Add new providers via adapters#60
andrejvysny wants to merge 19 commits intohuggingface:mainfrom
andrejvysny:provider-adapters-phase-2

Conversation

@andrejvysny
Copy link
Copy Markdown

@andrejvysny andrejvysny commented Apr 22, 2026

Summary

Verification

  • python3 -m py_compile agent/core/provider_adapters.py agent/core/llm_params.py
  • uv run --with pytest pytest tests/test_provider_adapters.py

andrejvysny and others added 2 commits April 23, 2026 10:29
Resolve conflicts by taking upstream for llm_params.py (will be
rewritten as thin adapter dispatcher) and main.py (model_switcher
extraction supersedes our changes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split NativeAdapter into AnthropicAdapter (thinking config +
output_config.effort) and OpenAIAdapter (reasoning_effort top-level).
Each adapter owns its accepted effort set and raises
UnsupportedEffortError in strict mode, preserving the effort_probe
cascade with zero changes to effort_probe.py or agent_loop.py.

llm_params.py becomes a thin dispatcher delegating to
resolve_adapter().build_params() while keeping the litellm
effort-validation patch and re-exporting UnsupportedEffortError.

model_switcher.py reads suggested models from the adapter registry
instead of maintaining a separate SUGGESTED_MODELS list.

backend/routes/agent.py replaces AVAILABLE_MODELS with
build_model_catalog().

OpenCodeGoAdapter deferred to PR huggingface#60.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@andrejvysny andrejvysny changed the title Add configurable provider registry and new LLM provider adapters Draft: Add new LLM provider adapters Apr 23, 2026
@andrejvysny andrejvysny force-pushed the provider-adapters-phase-2 branch from fb54391 to 5d357ba Compare April 23, 2026 10:46
@andrejvysny andrejvysny reopened this Apr 23, 2026
@andrejvysny andrejvysny changed the title Draft: Add new LLM provider adapters Refactor existing LLM resolution into provider adapters Apr 23, 2026
@andrejvysny andrejvysny changed the title Refactor existing LLM resolution into provider adapters Draft: Add new providers via adapters Apr 23, 2026
Restore the existing web model behavior so PR huggingface#55 stays a behavior-preserving refactor while keeping shared runtime and CLI validation logic.
Classify auth, credits, and missing-model failures once so the CLI, model switcher, and health checks show clean user-facing errors instead of raw provider traces.
@andrejvysny
Copy link
Copy Markdown
Author

Heads up: this PR builds on #55 and expands the adapter-based provider layer with additional providers / routing support.

Overlap notes:

If the final version of this PR includes Ollama/local model support directly, it may also supersede the earlier overlapping attempts in #44 and #47.

@statikkkkk
Copy link
Copy Markdown

Drive-by triage from a community contributor (no commit access). Sharing one-pass review notes to help maintainers prioritize. Verify before acting.

Verdict: DRAFT-WAIT · Risk: MED · State: Draft, depends on #55

Explicitly Draft and stacked on #55 which is still open. Don't merge until #55 lands and this is rebased onto the resulting main. Scope is large (~2.1k additions) — when it moves out of draft, pay particular attention to the local LM Studio discovery code (any network scan or filesystem enumeration could be a side-channel) and confirm session-scoped model switching validates the new model id before committing it. The TODO.md committed at the repo root should be removed before final merge.

andrejvysny and others added 7 commits April 24, 2026 11:02
Incorporate Bedrock adapter, error handling for missing adapters,
upstream main merge, and cleanup from PR#55.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Native LiteLLM adapter with reasoning_effort support (low/medium/high),
minimal->low normalization, suggested models (2.5 Pro, Flash).
Supersedes PR#95 approach with proper adapter pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 3m 5s —— View job


PR Review: Add new providers via adapters

  • Post initial comment
  • Read CLAUDE.md and gather context
  • Review changed files (provider_adapters.py, llm_params.py, agent_loop.py, model_switcher.py, llm_errors.py)
  • Review backend changes (routes/agent.py, session_manager.py)
  • Review frontend changes (ChatInput.tsx, CustomModelDialog.tsx, model.ts)
  • Review tests
  • Analyze security, correctness, LiteLLM/Bedrock routing impact
  • Post final review

Summary

Solid architecture — the adapter hierarchy is clean and the tests are thorough. Two things need to be fixed before merging: a quota/auth gate that fires incorrectly on OpenRouter→Anthropic routes, and blocking I/O in an async endpoint. Everything else is either low-risk or advisory.


🔴 Blocker 1 — _is_anthropic_model matches OpenRouter Anthropic routes

File: backend/routes/agent.py:44

def _is_anthropic_model(model_id: str) -> bool:
    return "anthropic" in model_id   # ← too broad

"anthropic" in "openrouter/anthropic/claude-sonnet-4.5" is True. This means:

  1. _require_hf_for_anthropic (line 48) will 403 non-HF-org users trying to use their own OpenRouter key to access Claude — they paid for it, and HF isn't billing them.
  2. _enforce_claude_quota (line 89) will charge their daily HF Claude quota for a session that is billed to their OpenRouter account, not the Space's ANTHROPIC_API_KEY. Users could hit their HF daily cap from OpenRouter usage.

The pre-existing check was correct when only anthropic/ was possible. Adding OpenRouter as a new provider in this PR is what breaks it.

Fix: return model_id.startswith("anthropic/")

Fix this →


🔴 Blocker 2 — Blocking urlopen in async GET /api/config/model

File: agent/core/provider_adapters.py:96

with urlopen(f"{api_base}/models", timeout=_DISCOVERY_TIMEOUT_SECONDS) as response:

urllib.request.urlopen is synchronous. It is called from _discover_models()available_models()get_available_models()build_model_catalog()GET /api/config/model (an async def route). Three adapters probe on each cold cache: Ollama, LM Studio, vLLM. When all three are unreachable this blocks the event loop for up to 6 seconds, freezing every concurrent request/SSE stream on the server.

The cache (30s TTL) limits re-probing but not the cold-start cost.

Fix: Wrap in asyncio.to_thread at the call site in the route, or use httpx.AsyncClient inside _discover_models. The simplest surgical fix is to make GET /api/config/model do the discovery in a thread:

@router.get("/config/model")
async def get_model() -> dict:
    return await asyncio.to_thread(
        build_model_catalog, session_manager.config.model_name
    )

Fix this →


🟡 Medium — LM Studio use_raw_model_name=True may send the wrong model id on the wire

File: agent/core/provider_adapters.py:397-399

"model": model_name if self.use_raw_model_name else f"openai/{model_id}",

With use_raw_model_name=True, model_name is lm_studio/google/gemma-3-12b. LiteLLM doesn't know the lm_studio/ prefix, so it either (a) fails to route, or (b) sends the full string — including the prefix — to LM Studio's API, which then can't find lm_studio/google/gemma-3-12b. The Ollama pattern ("openai/{model_id}") avoids this because LiteLLM strips openai/ before putting the id on the wire.

The tests only assert the params dict shape, not that LiteLLM routes correctly. This should be verified against a live LM Studio instance. If LM Studio expects google/gemma-3-12b without the namespace prefix, the fix is to drop use_raw_model_name and use the default f"openai/{model_id}" instead.


🟡 Medium — restore-session-summary has no message-count limit

File: backend/routes/agent.py:278 / backend/session_manager.py:226

messages = body.get("messages")
if not isinstance(messages, list) or not messages:
    raise HTTPException(status_code=400, detail="Missing 'messages' array")

An authenticated user can send an arbitrarily large messages list (100k items) triggering an expensive LLM summarization call billed to the Space's API key. A simple guard at the route level (e.g. len(messages) > 2000) would contain this.


🟢 Low — openai-compat/ switch succeeds even when OPENAI_COMPAT_BASE_URL is unset

File: backend/routes/agent.py:363 / agent/core/provider_adapters.py:530-533

is_valid_model_name("openai-compat/my-model") returns True regardless of env state, so the /session/{id}/model endpoint accepts the switch. The ValueError("OPENAI_COMPAT_BASE_URL is required") only fires at the next LLM call, producing an unrelated 500. Better to check at switch time:

adapter = resolve_adapter(model_id)
if adapter and not adapter.should_show():
    raise HTTPException(status_code=400, detail=f"Provider not configured for: {model_id}")

🟢 Low — LiteLLM patch is fragile but well-guarded

File: agent/core/llm_params.py:11-42

The monkey-patch on AnthropicConfig._is_opus_4_6_model is robust: _hf_agent_patched guards double-application, the entire function is wrapped in except Exception: return, and getattr guards against missing attributes. Risk here is low. Leave a changelog note for the next major LiteLLM upgrade.


What's good

  • Architecture: The ProviderAdapter hierarchy is clean. OpenAICompatAdapter as a base for Ollama/LM Studio/OpenRouter/OpenCode is a good reuse story.
  • Discovery: Negative caching (storing empty tuple on failure) + 30s TTL is appropriate for local services.
  • _discover_models error handling: Catches OSError, URLError, TimeoutError, ValueError — covers all realistic failure modes.
  • Test coverage: 391 lines of adapter tests cover happy paths, strict/non-strict effort rejection, and catalog visibility logic. The test_llm_errors.py suite is solid.
  • Session-scoped model switching: Correctly deep-copies config per session; quota gate fires at message-submit not at switch time.
  • CustomModelDialog: Cleans up state on close, disables backdrop click while submitting, shows preview of final model id — UX is sensible.

Merge verdict

Not yet — blockers 1 (quota/gate misfire) and 2 (event-loop blocking) should be fixed first. Both are small targeted changes. Once those land this is in good shape to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants