Skip to content

Add local OpenAI-compatible backend routing#68

Open
Pro777 wants to merge 1 commit intohuggingface:mainfrom
Spitfire-Cowboy:Pro777/openai-compat-local
Open

Add local OpenAI-compatible backend routing#68
Pro777 wants to merge 1 commit intohuggingface:mainfrom
Spitfire-Cowboy:Pro777/openai-compat-local

Conversation

@Pro777
Copy link
Copy Markdown

@Pro777 Pro777 commented Apr 23, 2026

Summary

Implements the narrow local-backend slice of #67 without pulling in the larger provider-registry work from #55 / #60.

This PR adds direct routing for these OpenAI-compatible prefixes:

  • ollama/<model>
  • lm_studio/<model>
  • vllm/<model>
  • openai-compat/<model>

Each prefix resolves to a configurable api_base + API key env var while keeping the existing Hugging Face Router fallback path unchanged.

Why this shape

#67 asks for an escape hatch to local OpenAI-compatible backends. There is already broader work in #55 / #60 around provider registries and additional hosted providers.

This PR intentionally stays smaller:

  • no new provider catalog
  • no frontend model-picker changes
  • no session-manager or restore-summary changes
  • no Rowan-specific defaults

It only teaches the current resolver and CLI validation layer about local/OpenAI-compatible prefixes.

New prefixes

  • ollama/ -> OLLAMA_BASE_URL (default http://localhost:11434/v1)
  • lm_studio/ -> LMSTUDIO_BASE_URL (default http://127.0.0.1:1234/v1)
  • vllm/ -> VLLM_BASE_URL (default http://127.0.0.1:8000/v1)
  • openai-compat/ -> OPENAI_COMPAT_BASE_URL (default http://127.0.0.1:8000/v1)

The local prefixes set sensible default API keys for backends that typically ignore auth (ollama, lm-studio, EMPTY) while still allowing override via env.

Tests

Ran:

uv run --with pytest pytest tests/ -v --tb=short

Result:

  • 5 passed

The new tests cover:

  • Ollama param resolution
  • LM Studio param resolution
  • vLLM param resolution
  • generic OpenAI-compatible param resolution
  • /model validation for the new local prefixes

Closes #67.

@statikkkkk
Copy link
Copy Markdown

Drive-by triage from a community contributor (no commit access). Sharing one-pass review notes to help maintainers prioritize. Verify before acting.

Verdict: NEEDS-CHANGES · Risk: MED (potential SSRF) · State: Mergeable

_resolve_openai_compat_params in llm_params.py reads api_base directly from env vars with no validation of scheme or host. If this agent ever runs in a shared/multi-tenant context, a malicious OPENAI_COMPAT_BASE_URL could route LLM calls to internal metadata endpoints (SSRF). Even in single-user mode, document explicitly. Also _DIRECT_PREFIXES in model_switcher.py is defined but is_valid_model_id already checks startswith — confirm there's no double-check divergence. There's significant overlap with #60 (which also adds lm_studio/); coordinate before merge. Tests cover only the happy path; add a test for env-var override.

@akseljoonas
Copy link
Copy Markdown
Collaborator

@claude please review this PR. Focus on correctness, security, LiteLLM/Bedrock routing impact, and whether it's safe to merge against current main. Keep it concise and prefer inline comments where it matters.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Claude finished @akseljoonas's task in 1m 42s —— View job


PR #68 Review: Add local OpenAI-compatible backend routing

Tasks

  • Gather context (read all changed files)
  • Analyze correctness and logic
  • Review security implications (SSRF, env-var handling)
  • Check LiteLLM/Bedrock routing impact
  • Assess merge safety vs current main

Verdict: Conditionally safe to merge

The core logic is correct and additive — no existing routing paths are touched. Two items need resolution before merge: overlap with #60, and a _DIRECT_PREFIXES duplication that will silently diverge. Everything else is low-risk or informational.


Correctness

Routing logic (llm_params.py:115–145) — clean. Prefix iteration, model stripping (model_name[len(prefix):]), and rstrip("/") on api_base are all correct. LiteLLM's openai/{model} + custom api_base is the right pattern for OpenAI-compat backends.

Effort handling — correct. extra_body is used for reasoning_effort so local servers can silently ignore it. strict=True raises UnsupportedEffortError cleanly, which the cascade in effort_probe.py:165–169 catches and advances past. Integration with the existing cascade is correct with no regressions.

is_valid_model_id (model_switcher.py:46–66) — this function doesn't actually use _DIRECT_PREFIXES. It only checks for two non-empty /-separated parts. That's fine — but _DIRECT_PREFIXES being a separate list from _OPENAI_COMPAT_PROVIDERS is a maintainability trap (see below).


Issues

[Blocking] Overlap with #60#60 also adds lm_studio/. Whichever merges second will conflict. Coordinate with the #60 author before merging either.

[Medium] _DIRECT_PREFIXES must be kept in sync manually

model_switcher.py:36–43 defines its own prefix list that duplicates keys from _OPENAI_COMPAT_PROVIDERS. _print_hf_routing_info uses _DIRECT_PREFIXES to skip the HF catalog lookup — if a new prefix is added to _OPENAI_COMPAT_PROVIDERS but not here, /model will print a spurious "isn't in the HF router catalog" warning. The fix is trivial: Fix this →

# model_switcher.py — instead of a hardcoded tuple:
from agent.core.llm_params import _OPENAI_COMPAT_PROVIDERS
_DIRECT_PREFIXES = ("anthropic/", "openai/", *_OPENAI_COMPAT_PROVIDERS)

[Low] No URL validation on api_base (llm_params.py:127–129)

os.environ.get(config["api_base_env"], ...) flows directly into LiteLLM with no scheme/host check. In a single-user local deployment this is fine. In a multi-tenant hosted context (HF Spaces) where an operator could set env vars, a file:// or http://169.254.169.254/ value would cause the agent to make requests to internal endpoints. Worth a short docstring note that these env vars are admin-controlled and not user-controllable. Fix this →


Test gaps

The existing tests only exercise the happy path. These cases are uncovered:

Missing case Why it matters
OLLAMA_BASE_URL override (custom URL) Verifies env-var override actually wins over the default
strict=True + unsupported effort (e.g. max on ollama) The cascade depends on UnsupportedEffortError being raised
No reasoning_effort arg Confirm extra_body key is absent (not just None)
Trailing slash on env var URL Confirm rstrip("/") actually fires

The 5 passing tests are a reasonable baseline for a first merge, but the strict-mode path in particular is exercised only through integration and should have a unit test. Fix this →


LiteLLM / Bedrock routing impact

No impact on Bedrock. The PR adds no bedrock/ prefix and doesn't touch any Bedrock-adjacent code paths.

LiteLLM routing — clean integration. openai/{model} + custom api_base is the documented LiteLLM pattern; it doesn't interfere with anthropic/, openai/, or HF router fallback paths. The _patch_litellm_effort_validation() call is unchanged.


Merge safety summary

Concern Severity Blocks merge?
Overlap with #60 (lm_studio/) Medium Yes — coordinate first
_DIRECT_PREFIXES duplication Medium No, but fix before it drifts
SSRF via env-var api_base Low (admin-controlled env) No
Test coverage gaps Low No
LiteLLM/Bedrock routing None

The implementation is solid and intentionally scoped. Resolve the #60 conflict, fix the _DIRECT_PREFIXES derivation, and this is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Allow routing to local OpenAI-compatible backends (Ollama / vLLM / LM Studio)

3 participants