Skip to content

fix(azure): single-chunk text stream misclassified as tool-call response#7422

Open
alvinttang wants to merge 1 commit intomicrosoft:mainfrom
alvinttang:fix/azure-streaming-single-chunk-content
Open

fix(azure): single-chunk text stream misclassified as tool-call response#7422
alvinttang wants to merge 1 commit intomicrosoft:mainfrom
alvinttang:fix/azure-streaming-single-chunk-content

Conversation

@alvinttang
Copy link
Copy Markdown

Summary

  • Bug: AzureAIChatCompletionClient.create_stream() uses len(content_deltas) > 1 to distinguish text responses from tool-call responses. When a model returns a short answer in a single streaming chunk, content_deltas has exactly 1 element, the > 1 check fails, and the code falls through to return an empty tool-call list [] as content — silently discarding the actual text (or misassigning it to thought).
  • Fix: Rewrite the content/tool-call branching to match the OpenAI client's correct logic: check if full_tool_calls: first, else treat as text content.
  • Test: Add regression test that streams a single-chunk text response and asserts it is returned as string content in the CreateResult.

Root cause

# BEFORE (wrong) — line 562 of _azure_ai_client.py
if len(content_deltas) > 1:        # <-- fails when exactly 1 chunk
    content = "".join(content_deltas)
else:
    content = list(full_tool_calls.values())  # <-- returns [] for text responses
# AFTER (correct) — matches OpenAI client pattern
if full_tool_calls:
    content = list(full_tool_calls.values())
else:
    content = "".join(content_deltas) if content_deltas else ""

Test plan

  • New test test_azure_ai_chat_completion_client_create_stream_single_chunk — mocks a single-chunk text stream, asserts CreateResult.content is a string (not an empty list)
  • Existing Azure AI model client tests continue to pass
  • Manual verification with a real Azure endpoint returning short responses

The Azure AI client's `create_stream()` uses `len(content_deltas) > 1`
to decide whether the response is text or tool calls. When a model
returns a short answer that fits in a single streaming chunk,
`content_deltas` has exactly 1 element, causing the check to fail. The
code then falls through to return an empty tool-call list as content
and misassigns the actual text to the `thought` field.

Rewrite the branching to match the OpenAI client: check `full_tool_calls`
first (tool-call response), else treat as text. Add a regression test
that streams a single-chunk text reply and asserts it comes back as
string content.

Closes microsoft#7157 (partial — addresses a separate bug in the same streaming
path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant