fix(cloud-proxy): parameter compatibility with newest reasoning models by walcz-de · Pull Request #10640 · mudler/LocalAI

walcz-de · 2026-07-02T07:45:37Z

What

The cloud-proxy translate providers currently send two parameters that the newest cloud reasoning models reject:

temperature / top_p — Anthropic claude-opus-4-x and OpenAI gpt-5.x return 400 when temperature is present ("'temperature' is deprecated for this model"). Since OpenAI-compatible clients (chat UIs) typically send only the server-side default sampling values rather than real user intent, the translators now forward neither temperature nor top_p and let the upstream apply its own defaults. This extends the existing "drop top_p when both are set" workaround to its logical conclusion.
max_tokens — OpenAI gpt-5.x rejects it ("Unsupported parameter: 'max_tokens' ... Use 'max_completion_tokens' instead"). The OpenAI translator now serializes the token limit as max_completion_tokens, which current chat-completions models accept.

Verification

Verified live against claude-opus-4-8, gpt-5.5 and gemini-3.1-pro (Gemini OpenAI-compat endpoint) through a LocalAI deployment — all three now answer through the cloud-proxy backend where they previously failed with the 400s above. go test ./backend/go/cloud-proxy/... green (tests updated to the new contract).

Notes

If per-model explicit sampling control is desired later, a model-config opt-in would be the cleaner path than unconditional forwarding — happy to follow up if wanted.

Newest cloud reasoning models reject two parameters the cloud-proxy backend currently sends: - Anthropic (claude-opus-4-x) and OpenAI (gpt-5.x) return 400 when temperature is present: "'temperature' is deprecated for this model". OpenAI-compatible clients typically send only the server-side DEFAULT sampling values rather than user intent, so the translators now forward neither temperature nor top_p and let the upstream apply its own defaults. - OpenAI gpt-5.x rejects max_tokens ("Unsupported parameter: 'max_tokens' ... Use 'max_completion_tokens' instead"). The OpenAI translator now serializes the token limit as max_completion_tokens, which current chat-completions models accept. Verified live against claude-opus-4-8, gpt-5.5 and gemini-3.1-pro (Gemini OpenAI-compat endpoint). Tests updated to the new contract. Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: stefanwalcz <stefan.walcz@walcz.de>

richiejp

Looks good, thanks!

mudler requested a review from richiejp July 2, 2026 07:46

richiejp approved these changes Jul 2, 2026

View reviewed changes

mudler merged commit 9d8ff90 into mudler:master Jul 2, 2026
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cloud-proxy): parameter compatibility with newest reasoning models#10640

fix(cloud-proxy): parameter compatibility with newest reasoning models#10640
mudler merged 1 commit into
mudler:masterfrom
walcz-de:fix/cloud-proxy-newest-model-params

walcz-de commented Jul 2, 2026

Uh oh!

richiejp left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

walcz-de commented Jul 2, 2026

What

Verification

Notes

Uh oh!

richiejp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants