OpenAI Responses API LLM service implementation #3164

vmfullyfaltu · 2025-11-28T19:45:43Z

Overview

Complete migration from OpenAI Chat Completions API to the new Responses API with stateful conversation support and incremental mode implementation.

Motivation

The OpenAI Responses API provides several advantages over the Chat Completions API:

Stateful conversations with server-side storage (when store=True)
Incremental updates using previous_response_id (90% token savings)
Event-based streaming with ResponseStreamEvent
Improved tool calling support
Better multi-turn conversation handling

Core Service: `OpenAIResponsesLLMService`

A new LLM service that inherits from BaseOpenAILLMService and implements the Responses API:

class OpenAIResponsesLLMService(BaseOpenAILLMService):
    """OpenAI Responses API LLM service implementation.
    
    Uses client.responses.create() instead of client.chat.completions.create()
    """

Key Features

1. API Endpoint Migration

Old: client.chat.completions.create()
New: client.responses.create()

2. Message Format Conversion

Converts Chat Completions message format to Responses API input format:

Chat Completions format:

{"role": "user", "content": "Hello"}

Responses API format:

{
    "type": "message",
    "role": "user", 
    "content": [{"type": "input_text", "text": "Hello"}]
}

Key conversions handled:

"text" → "input_text" (user messages)
"text" → "output_text" (assistant messages)
"image_url" → "input_image" (with base64 data extraction)
Tool calls: Nested structure → Flat structure with call_id at top level
Tool results: Added as function_call_output items

3. Stateful Conversations (`store=True`)

Enables server-side conversation storage with incremental updates:

params = OpenAIResponsesLLMService.InputParams(
    temperature=0.7,
    store=True,  # Enable stateful mode
    metadata={"call_sid": "CA1234"}
)

Benefits:

90% token savings: Send only new messages, not entire history
50% latency reduction: Smaller requests, faster processing
Metadata tracking: Associate conversations with business context

4. Incremental Mode

Implements robust context reset detection using dual-signal detection:

Tracking Dictionaries:

self._response_ids: Dict[int, str] = {}           # Maps context_id → response_id
self._sent_message_counts: Dict[int, int] = {}    # Tracks messages sent per context
self._message_content_hashes: Dict[int, str] = {} # Content hash for reset detection

Detection Logic:

# Signal 1: Message count decreased (RESET_WITH_SUMMARY)
if len(messages) < sent_count:
    # Context was reset - clear tracking, send full context
    
# Signal 2: Same count but content changed (RESET edge case)
elif len(messages) == sent_count and hash(messages) != previous_hash:
    # Content changed - clear tracking, send full context
    
# Signal 3: Count increased (APPEND)
elif sent_count < len(messages):
    # Send only new messages (incremental mode)
    
# Signal 4: No change
else:
    # Send nothing

5. Event Stream Processing

Handles 7 event types from Responses API:

ResponseCompletedEvent          # Capture response_id, usage metrics
ResponseTextDeltaEvent          # Stream text deltas
ResponseFunctionCallArgumentsDeltaEvent    # Stream function arguments
ResponseFunctionCallArgumentsDoneEvent     # Arguments complete
ResponseOutputItemAddedEvent    # New function call started (get name)
ResponseOutputItemDoneEvent     # Function call complete

6. Tool Calling Format

Converts nested Chat Completions tool format to flat Responses API format:

Chat Completions (nested):

{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather data",
        "parameters": {...}
    }
}

Responses API (flat):

{
    "type": "function",
    "name": "get_weather",
    "description": "Get weather data", 
    "parameters": {...}
}

Context Aggregators

Provides OpenAI-specific context aggregators:

class OpenAIUserContextAggregator(LLMUserContextAggregator):
    """Handles user message aggregation"""
    
class OpenAIAssistantContextAggregator(LLMAssistantContextAggregator):
    """Handles assistant messages, function calls, and results"""

Integration: `bot.py`

Import and Configuration

from pipecat.services.openai.response_api.llm import OpenAIResponsesLLMService

# Configure with stateful mode
llm_params = OpenAIResponsesLLMService.InputParams(
    temperature=0.7,
    max_tokens=4000,
    store=True,  # Enable incremental mode
    metadata={
        "call_sid": call_sid,
    }
)

llm = OpenAIResponsesLLMService(
    model="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    params=llm_params
)

Pipecat Flows Compatibility -- IMPORTANT until pipecat_flows is also updated

Uses monkey-patch for seamless integration:

# Monkey-patch for pipecat_flows adapter compatibility
import sys
from pipecat_flows.adapters import create_adapter as _original_create_adapter, OpenAIAdapter

def _patched_create_adapter(llm, context_aggregator):
    """Patched adapter creation to support OpenAIResponsesLLMService"""
    llm_type = type(llm).__name__
    if llm_type == "OpenAIResponsesLLMService":
        from loguru import logger
        logger.debug("Creating OpenAI adapter for OpenAIResponsesLLMService")
        return OpenAIAdapter()
    return _original_create_adapter(llm, context_aggregator)

# Replace the create_adapter function in pipecat_flows
import pipecat_flows.manager
pipecat_flows.manager.create_adapter = _patched_create_adapter

Latency Improvement

Without incremental: 800-1000ms (full context every time)
With incremental: 400-500ms (new messages only)
Improvement: ~50% reduction

Backward Compatibility

Inherits from BaseOpenAILLMService - drop-in replacement
Works with both OpenAILLMContext and universal LLMContext
Compatible with pipecat_flows via monkey-patch
Supports all existing aggregators and frame types

API Compatibility

Works with all GPT models (gpt-4o, gpt-4.1, gpt-3.5-turbo)
Supports o-series models with reasoning_effort parameter
Full tool calling support
Vision capabilities (input_image)

Architecture

Clean inheritance from BaseOpenAILLMService
Separation of concerns (format conversion, tracking, streaming)
Type hints throughout
Comprehensive docstrings

Error Handling

Timeout retry mechanism
JSON parse error handling
Missing field validation (tool_call_id, call_id)
Graceful degradation

Logging

Debug logs for all major operations
Clear reset detection messages
Usage metrics tracking
Input/output validation

Migration Guide

For Existing Users

Before (Chat Completions):

from pipecat.services.openai import OpenAILLMService

llm = OpenAILLMService(
    model="gpt-4o",
    api_key=api_key
)

After (Responses API):

from pipecat.services.openai.response_api.llm import OpenAIResponsesLLMService

llm_params = OpenAIResponsesLLMService.InputParams(
    store=True,  # Enable stateful mode
    metadata={"conversation_id": "conv123"}
)

llm = OpenAIResponsesLLMService(
    model="gpt-4o",
    api_key=api_key,
    params=llm_params
)

Configuration Options

class InputParams(BaseModel):
    temperature: Optional[float] = NOT_GIVEN  # 0.0 to 2.0
    max_tokens: Optional[int] = NOT_GIVEN     # Maximum response tokens
    store: Optional[bool] = False             # Enable stateful mode
    metadata: Optional[Dict[str, str]] = {}   # Conversation metadata
    reasoning_effort: Optional[str] = NOT_GIVEN  # For o-series models
    extra: Optional[Dict[str, Any]] = {}      # Additional parameters

vmfullyfaltu · 2025-12-05T23:43:50Z

@markbackman let me know if any feedback?

markbackman · 2025-12-06T00:04:22Z

Thanks for submitting this, but I have a hunch that this migration is going to be a bit more complex, as there are changes needed to the context as well. I'll leave this PR open, but it's likely that the maintainers will take this task on. We have work planned for this soon.

vmfullyfaltu · 2025-12-06T02:10:16Z

@markbackman appreciate the quick response. I am using it in my development and did not need any context to be updated.

by maintainers, apologize for the lack of understanding, is the pipecat/OpenAI interface maintained by a different set of team (not the daily/pipecat team)?

markbackman · 2025-12-06T02:22:07Z

Glad it's working for you. We haven't had time to review yet.

This PR was a community submission, not from the maintainers. That doesn't mean it's not good. We just haven't had a chance to review yet. He have discussed this need and acknowledge that it's a big change.

OpenAI Responses API LLM service implementation

bcbd970

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenAI Responses API LLM service implementation #3164

OpenAI Responses API LLM service implementation #3164

vmfullyfaltu commented Nov 28, 2025 •

edited

Loading

Uh oh!

vmfullyfaltu commented Dec 5, 2025

Uh oh!

markbackman commented Dec 6, 2025

Uh oh!

vmfullyfaltu commented Dec 6, 2025

Uh oh!

markbackman commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OpenAI Responses API LLM service implementation #3164

Are you sure you want to change the base?

OpenAI Responses API LLM service implementation #3164

Conversation

vmfullyfaltu commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Motivation

Core Service: OpenAIResponsesLLMService

Key Features

1. API Endpoint Migration

2. Message Format Conversion

3. Stateful Conversations (store=True)

4. Incremental Mode

5. Event Stream Processing

6. Tool Calling Format

Context Aggregators

Integration: bot.py

Import and Configuration

Pipecat Flows Compatibility -- IMPORTANT until pipecat_flows is also updated

Latency Improvement

Backward Compatibility

API Compatibility

Architecture

Error Handling

Logging

Migration Guide

For Existing Users

Configuration Options

Uh oh!

vmfullyfaltu commented Dec 5, 2025

Uh oh!

markbackman commented Dec 6, 2025

Uh oh!

vmfullyfaltu commented Dec 6, 2025

Uh oh!

markbackman commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vmfullyfaltu commented Nov 28, 2025 •

edited

Loading

Core Service: `OpenAIResponsesLLMService`

3. Stateful Conversations (`store=True`)

Integration: `bot.py`