Skip to content

Conversation

@ieaves
Copy link
Collaborator

@ieaves ieaves commented Nov 25, 2025

Summary by Sourcery

Introduce a provider-agnostic chat abstraction and hosted API transport support while updating file loading and testing to use structured chat messages.

New Features:

  • Add generic chat provider interfaces and OpenAI-specific implementations for both local-compatible and hosted APIs.
  • Introduce an API transport for routing chat calls directly to hosted providers based on model URL schemes (e.g., openai://).

Enhancements:

  • Refactor chat handling to use structured ChatMessage/parts, provider-based request creation, and unified streaming response parsing.
  • Extend configuration with provider-specific settings (such as OpenAI API key resolution) and wire them into API transports.
  • Adapt file loader and chat integration logic to emit and consume structured message parts for text and images.
  • Update transports and CLI to recognize API-based models, disallow incompatible rag/serve modes, and route run commands through the new API transport.

Tests:

  • Add and update unit tests for file loaders, chat integration, transport factory behavior, and the new API transport and provider abstractions.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 25, 2025

Reviewer's Guide

Introduces a provider-agnostic chat abstraction and hosted API transport (starting with OpenAI), refactors chat message handling to a structured parts-based model, and updates file loaders, CLI, and tests to use the new abstractions and support hosted API models like openai://gpt-4o-mini.

File-Level Changes

Change Details Files
Refactor chat shell to use provider-based API calls and structured ChatMessage history instead of raw dicts.
  • Add ChatProvider injection into RamaLamaShell and chat(), defaulting to OpenAIChatProvider using args.url and api_key.
  • Replace direct urllib payload construction in _make_api_request and _make_request_data with provider.create_request using ChatRequestOptions.
  • Switch conversation_history and summarization logic to operate on ChatMessage objects and helper formatting methods.
  • Update MCP integration to use a serializable history snapshot derived from ChatMessage instances.
  • Replace legacy streaming helper with stream_response that delegates parsing to the provider.
ramalama/chat.py
ramalama/chat_utils.py
ramalama/chat_providers/base.py
ramalama/chat_providers/openai.py
ramalama/chat_providers/__init__.py
Introduce hosted API transport support (e.g., openai:// models) and integrate with CLI run/serve flows.
  • Add APITransport and APIProviderSpec to proxy chat to hosted providers without starting a local server.
  • Extend TransportFactory to detect API schemes, create APITransport, and prune model inputs for API models.
  • Wire CLI run_cli and serve_cli to disallow rag/serve for API transports and to avoid building server commands for them.
  • Expose API transport symbols from transports package and extend MODEL_TYPES list to include openai.
  • Add unit tests for APITransport behavior and transport factory detection/pruning for openai:// models.
ramalama/transports/api.py
ramalama/transports/transport_factory.py
ramalama/transports/base.py
ramalama/transports/__init__.py
ramalama/cli.py
test/unit/test_api_transport.py
test/unit/test_transport_factory.py
Adopt ChatMessage and message-part model across file loaders and chat/file integration tests.
  • Change OpanAIChatAPIMessageBuilder.load to return ChatMessage objects with TextPart and ImageURLPart parts instead of dicts.
  • Update FileManager to build system ChatMessage instances for text and image content using TextPart/ImageURLPart.
  • Refactor file loader unit tests to assert on ChatMessage.role, parts, and helper extractors instead of raw content dicts.
  • Adjust chat/file integration tests to inspect ChatMessage-based conversation history and content extraction helpers.
ramalama/file_loaders/file_manager.py
test/unit/test_file_loader.py
test/unit/test_file_loader_with_data.py
test/unit/test_file_loader_integration.py
ramalama/chat_utils.py
Add shared chat utilities and configuration for providers and minor config cleanups.
  • Introduce chat_utils module with ChatMessage, typed message parts, stream_response, default_prefix, add_api_key, and serialization helpers.
  • Add ProviderConfig to BaseConfig to hold provider-specific settings like openai_api_key, and wire it into CONFIG.
  • Make minor config code cleanups (list literal formatting, remove extraneous blank lines in loops).
ramalama/chat_utils.py
ramalama/config.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ieaves, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the chat and API interaction components of the application. The primary goal is to introduce a robust and extensible abstraction for handling various chat API providers, making it easier to add support for new services and maintain existing ones. This involves creating a new set of structured data models for chat messages and a dedicated API transport layer, which streamlines how the application communicates with external language models.

Highlights

  • Abstracted Chat Providers: Introduced a new abstraction layer for chat providers, allowing for easier integration of different API services (e.g., OpenAI) by standardizing request and response handling. This includes base classes for ChatProvider, ChatRequestOptions, and ChatStreamEvent.
  • Structured Chat Messages: Implemented new ChatMessage and MessagePart dataclasses to represent conversation history and message content in a more structured and type-safe manner, replacing previous dictionary-based message formats.
  • API Transport Integration: Added an APITransport mechanism that enables direct connection to hosted API providers, bypassing local server setup for supported services. This is integrated into the CLI and transport factory.
  • Refactored API Calling Logic: The core API request and response streaming logic in ramalama/chat.py has been refactored to leverage the new chat provider abstraction, leading to cleaner and more modular code.
  • Configuration Updates: The configuration system (ramalama/config.py) now includes a ProviderConfig to manage API keys and settings for different chat providers, such as openai_api_key.
  • Utility Function Relocation: Common utility functions like stream_response, default_prefix, and add_api_key have been moved from ramalama/chat.py to a new ramalama/chat_utils.py module for better organization and reusability.
  • File Loader Modernization: The OpanAIChatAPIMessageBuilder in ramalama/file_loaders/file_manager.py has been updated to produce ChatMessage objects, aligning with the new structured message format.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • In _req, catching Exception and then calling e.read().decode() assumes all exceptions have a .read() method, which will fail for non-HTTP exceptions (e.g., URLError); consider checking for an HTTPError type or hasattr(e, 'read') before accessing it.
  • In OpenAIHostedChatProvider.build_payload, you unconditionally access payload['max_completion_tokens'] after conditionally popping it, which can raise a KeyError when max_tokens is None; guard the zero-check with an 'if "max_completion_tokens" in payload' or similar.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In _req, catching Exception and then calling e.read().decode() assumes all exceptions have a .read() method, which will fail for non-HTTP exceptions (e.g., URLError); consider checking for an HTTPError type or hasattr(e, 'read') before accessing it.
- In OpenAIHostedChatProvider.build_payload, you unconditionally access payload['max_completion_tokens'] after conditionally popping it, which can raise a KeyError when max_tokens is None; guard the zero-check with an 'if "max_completion_tokens" in payload' or similar.

## Individual Comments

### Comment 1
<location> `ramalama/chat.py:62-65` </location>
<code_context>
         self.request_in_process = False
         self.prompt = args.prefix
         self.url = f"{args.url}/chat/completions"
+        self.provider = provider or OpenAIChatProvider(args.url, getattr(args, "api_key", None))
+        add_api_key(self.args)
         self.prep_rag_message()
         self.mcp_agent: LLMAgent | None = None
</code_context>

<issue_to_address>
**suggestion:** Calling add_api_key() here is a no-op and may be misleading.

add_api_key() only returns a headers dict and logs a warning; it doesn’t mutate args. Since its return value is ignored here, the call has no effect and may imply that auth is being wired when it isn’t. Please either remove this call (the provider already gets api_key) or move/refactor it so its returned headers are actually used where requests are built.

```suggestion
        self.url = f"{args.url}/chat/completions"
        self.provider = provider or OpenAIChatProvider(args.url, getattr(args, "api_key", None))
        self.prep_rag_message()
```
</issue_to_address>

### Comment 2
<location> `ramalama/chat.py:380-381` </location>
<code_context>
                 response = urllib.request.urlopen(request)
                 break
-            except Exception:
+            except Exception as e:
+                print(e.read().decode())
                 if sys.stdout.isatty():
                     perror(f"\r{c}", end="", flush=True)
</code_context>

<issue_to_address>
**issue (bug_risk):** Unconditionally calling e.read() on a generic Exception can raise a new error and hide the original failure.

This assumes every Exception has a .read() method (like urllib.error.HTTPError), but many (e.g., connection errors, timeouts) do not, so this can raise AttributeError and hide the real error. Consider either catching HTTPError explicitly or checking hasattr(e, "read") before calling it, and for other exceptions log/print str(e) instead.
</issue_to_address>

### Comment 3
<location> `ramalama/chat_utils.py:145-146` </location>
<code_context>
+        if part.detail:
+            payload["detail"] = part.detail
+        return {"type": "image_url", "image_url": payload}
+    if isinstance(part, ImageBytesPart):
+        return {"type": "image_bytes", "image_bytes": {"data": part.data, "mime_type": part.mime_type}}
+    if isinstance(part, ToolCallPart):
+        return {"type": "tool_call", "name": part.name, "arguments": dict(part.arguments)}
</code_context>

<issue_to_address>
**issue (bug_risk):** Serializing ImageBytesPart with raw bytes will break JSON encoding for OpenAI-style providers.

For ImageBytesPart, serialize_part() returns a dict containing raw bytes. When OpenAIChatProvider later calls json.dumps() on the payload, this will fail with "Object of type bytes is not JSON serializable". Please convert part.data to a JSON-serializable form (e.g., base64 string or data URL) before returning it from serialize_part().
</issue_to_address>

### Comment 4
<location> `ramalama/chat_providers/openai.py:83-87` </location>
<code_context>
+class OpenAIHostedChatProvider(OpenAIChatProvider):
+    def build_payload(self, messages: Sequence[ChatMessage], options: ChatRequestOptions) -> dict[str, object]:
+        payload = super().build_payload(messages, options)
+        max_tokens = payload.pop("max_tokens", None)
+        if max_tokens is not None:
+            payload["max_completion_tokens"] = max_tokens
+
+        if payload["max_completion_tokens"] == 0:
+            payload.pop("max_completion_tokens")
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Accessing payload["max_completion_tokens"] unconditionally can raise a KeyError when max_tokens is None.

When max_tokens is None (not present), we never set "max_completion_tokens", but we still do `if payload["max_completion_tokens"] == 0:`, which will raise a KeyError. Instead, compare the local variable (`if max_tokens == 0:`) or guard the dict access (`if "max_completion_tokens" in payload and payload["max_completion_tokens"] == 0:`).
</issue_to_address>

### Comment 5
<location> `test/unit/test_api_transport.py:10-19` </location>
<code_context>
+from ramalama.transports.api import APIProviderSpec, APITransport
+
+
+def test_api_transport_run(monkeypatch):
+    provider = APIProviderSpec("openai", "https://api.openai.com/v1")
+    transport = APITransport("gpt-4o-mini", provider)
+    recorded: dict[str, object] = {}
+
+    def fake_chat(args, operational_args=None, provider=None):
+        recorded["args"] = args
+        recorded["operational_args"] = operational_args
+        recorded["provider"] = provider
+
+    monkeypatch.setattr(api_module, "chat", fake_chat)
+
+    args = SimpleNamespace(container=True, engine="podman", url="http://localhost", model=None, api="none")
+    transport.run(args, [])
+
+    assert args.container is False
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests to cover additional APITransport behaviors such as inspect(), exists(), and provider selection

Current tests cover run(), argument mutation, and API key resolution well. To better lock in behavior and prevent regressions, please also add tests for:
- inspect(): asserting the returned dict includes provider, model, and base_url as expected.
- exists(): verifying it always returns True, with a test name that documents this as intentional.
- _build_chat_provider(): confirming that `scheme == "openai"` yields an OpenAIHostedChatProvider and unknown schemes return None.
These will make APITransport’s contract clearer and safer to evolve.

Suggested implementation:

```python
from ramalama.transports import api as api_module
from ramalama.transports.api import APIProviderSpec, APITransport, OpenAIHostedChatProvider

```

```python
def test_api_transport_ensure_exists_mutates_args():
    provider = APIProviderSpec("openai", "https://api.openai.com/v1")
    transport = APITransport("gpt-4", provider)

    args = SimpleNamespace(container=True, engine="podman", url="http://localhost", model=None, api="none")

    # ensure_exists should mutate args in the same way as run()
    transport.ensure_exists(args)

    assert args.container is False
    assert args.engine is None
    assert args.url == provider.base_url
    assert args.model == "gpt-4"
    assert args.api == provider.scheme


def test_api_transport_inspect_includes_provider_model_and_base_url():
    provider = APIProviderSpec("openai", "https://api.openai.com/v1")
    transport = APITransport("gpt-4o-mini", provider)

    info = transport.inspect()

    # Contract: inspect() returns a dict that exposes provider, model, and base_url
    assert isinstance(info, dict)
    assert info["provider"] == provider.scheme
    assert info["model"] == "gpt-4o-mini"
    assert info["base_url"] == provider.base_url


def test_api_transport_exists_always_true():
    provider = APIProviderSpec("openai", "https://api.openai.com/v1")
    transport = APITransport("gpt-4o-mini", provider)
    args = SimpleNamespace(container=True, engine="podman", url="http://localhost", model=None, api="none")

    # Contract: exists() is intentionally always True for API transports
    assert transport.exists(args) is True


def test_api_transport_build_chat_provider_openai_and_unknown_scheme():
    provider = APIProviderSpec("openai", "https://api.openai.com/v1")
    transport = APITransport("gpt-4o-mini", provider)

    # openai scheme should yield an OpenAIHostedChatProvider
    chat_provider = transport._build_chat_provider(provider)
    assert isinstance(chat_provider, OpenAIHostedChatProvider)

    # Unknown schemes should not build any provider and return None
    unknown_provider = APIProviderSpec("unknown-scheme", "https://example.com")
    assert transport._build_chat_provider(unknown_provider) is None

```

These edits assume the following existing signatures and behaviors:
1. APITransport.ensure_exists(args) exists and mutates args in the same way as run() with respect to:
   - args.container (set to False)
   - args.engine (set to None)
   - args.url (set to provider.base_url)
   - args.model (set to the transport's model)
   - args.api (set to provider.scheme)
2. APITransport.inspect() returns a dict containing at least keys "provider", "model", and "base_url", where:
   - "provider" is the provider.scheme
   - "model" is the model string passed to APITransport
   - "base_url" is provider.base_url
3. APITransport.exists(args) exists and always returns True.
4. APITransport._build_chat_provider() accepts an APIProviderSpec instance and:
   - Returns an OpenAIHostedChatProvider when provider.scheme == "openai"
   - Returns None for unknown schemes.
5. OpenAIHostedChatProvider is importable from ramalama.transports.api.

If any of these assumptions differ from the current implementation, you'll need to:
- Adjust the assertions in the tests to match the actual keys/values returned by inspect().
- Update the tests to match the actual signatures of ensure_exists(), exists(), and _build_chat_provider().
- Update the import of OpenAIHostedChatProvider to match its real module path or class name.
</issue_to_address>

### Comment 6
<location> `test/unit/test_file_loader_integration.py:11-12` </location>
<code_context>
 from ramalama.file_loaders.file_types.txt import TXTFileLoader


+def _text_content(message):
+    return "".join(part.text for part in message.parts if isinstance(part, TextPart))
+
+
</code_context>

<issue_to_address>
**nitpick:** Use the shared _text_content helper consistently for readability and to avoid subtle inconsistencies

Some tests in this file still build message text manually (e.g., in test_chat_with_file_input_single_file) instead of calling _text_content. Please use _text_content everywhere you need the message’s text so future structure changes only require updating this helper, not each test.
</issue_to_address>

### Comment 7
<location> `ramalama/chat_utils.py:55` </location>
<code_context>
+    type: Literal["custom"] = "custom"
+
+
+MessagePart = TextPart | ImageURLPart | ImageBytesPart | ToolCallPart | ToolResultPart | CustomPart
+
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider narrowing the message/part API to only the currently used text parts and adding a simple content accessor on ChatMessage to reduce cognitive load while keeping room for future features.

You can trim a lot of surface area here without losing any currently-used behavior.

Right now only simple text messages (and possibly no multimodal/tooling yet) are exercised, but there’s a full `MessagePart` hierarchy and `serialize_part` branches for six different kinds of parts. That makes the mental model heavier for readers and reviewers.

Two concrete simplifications that keep the provider abstraction and streaming logic intact:

1) Narrow `MessagePart` and `serialize_part` to what’s actually used

If only text is currently used in `ChatMessage` and serialization, you can keep the future-proof layout but only expose/handle `TextPart` for now. The other part types can be added back when you introduce multimodal/tooling.

For example:

```python
# Keep the dataclass for future use, but make it internal / unused for now
@dataclass(slots=True)
class TextPart:
    text: str
    type: Literal["text"] = "text"

# Only expose the part(s) you actually use
MessagePart = TextPart  # | OtherPartTypesLater

def serialize_part(part: MessagePart) -> dict[str, Any]:
    # Only handle supported parts
    if isinstance(part, TextPart):
        return {"type": "text", "text": part.text}
    raise TypeError(f"Unsupported message part: {part!r}")
```

And in `__all__`:

```python
__all__ = [
    "ChatMessage",
    "MessagePart",
    "TextPart",
    "add_api_key",
    "default_prefix",
    "stream_response",
    "serialize_part",
]
```

You can keep the other part classes in the file but not include them in `MessagePart` or `__all__` until they’re actually needed. That:

- Removes unused branches from `serialize_part`.
- Reduces the public API surface the reader has to understand.
- Still keeps the file ready to grow when multimodal/tooling support is implemented.

2) Simplify `ChatMessage` usage with a `content` property

Since you always create messages with a single `TextPart`, you can make that pattern explicit and easier to read, while keeping the `parts` structure internally:

```python
@dataclass(slots=True)
class ChatMessage:
    role: RoleType
    parts: list[MessagePart] = field(default_factory=list)
    metadata: MutableMapping[str, Any] = field(default_factory=dict)

    @property
    def content(self) -> str:
        # Simple view for the current use case: first text part or empty string
        for part in self.parts:
            if isinstance(part, TextPart):
                return part.text
        return ""

    @staticmethod
    def system(text: str) -> "ChatMessage":
        return ChatMessage(role="system", parts=[TextPart(text=text)])

    @staticmethod
    def user(text: str) -> "ChatMessage":
        return ChatMessage(role="user", parts=[TextPart(text=text)])

    @staticmethod
    def assistant(text: str) -> "ChatMessage":
        return ChatMessage(role="assistant", parts=[TextPart(text=text)])

    @staticmethod
    def tool(text: str) -> "ChatMessage":
        return ChatMessage(role="tool", parts=[TextPart(text=text)])
```

Callers can then work with `message.content` instead of thinking about parts, while the internal structure remains ready for future expansion.

These two changes keep the new abstraction (provider-agnostic streaming, `ChatMessage` type, etc.) but significantly reduce the immediate cognitive load by:

- Eliminating unused branches and types from the public path.
- Presenting a simple “text message” mental model to users of the API.
</issue_to_address>

### Comment 8
<location> `ramalama/chat_providers/base.py:56` </location>
<code_context>
+        self.payload = payload
+
+
+class ChatProvider(ABC):
+    """Abstract base class for hosted chat providers."""
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying the ChatProvider customization hooks by collapsing multiple header and path extension points into single overridable methods to make request construction easier to follow.

You can trim some indirection in the base provider without losing any current/extensibility benefits.

1) Collapse header hooks into one

`prepare_headers` currently orchestrates three sources:

- `self._default_headers`
- `self.provider_headers(options)`
- `self.additional_request_headers(options)`

…but all of that can be expressed as a single customization point. This keeps the same power while making it easier to see where headers come from.

Suggested refactor:

- Replace `provider_headers` and `additional_request_headers` with a single `request_headers` override.
- Wire `create_request` to call only `request_headers`.

Example:

```python
class ChatProvider(ABC):
    ...

    def prepare_headers(
        self,
        *,
        include_auth: bool = True,
        extra: Mapping[str, str] | None = None,
        options: ChatRequestOptions | None = None,
    ) -> dict[str, str]:
        headers: dict[str, str] = {"Content-Type": "application/json"}
        headers.update(self._default_headers)
        # Single customization point
        headers.update(self.request_headers(options))
        if include_auth:
            headers.update(self.auth_headers())
        if extra:
            headers.update(extra)
        return headers

    # Replace the two hooks with one
    def request_headers(self, options: ChatRequestOptions | None = None) -> Mapping[str, str]:
        return {}
```

Then simplify `create_request`:

```python
def create_request(self, messages: Sequence[ChatMessage], options: ChatRequestOptions) -> urllib_request.Request:
    payload = self.build_payload(messages, options)
    headers = self.prepare_headers(options=options)
    body = self.serialize_payload(payload)
    return urllib_request.Request(
        self.build_url(),  # see path suggestion below
        data=body,
        headers=headers,
        method="POST",
    )
```

If a specific provider needs extra headers, it just overrides `request_headers`:

```python
class OpenAIChatProvider(ChatProvider):
    def request_headers(self, options: ChatRequestOptions | None = None) -> Mapping[str, str]:
        return {"OpenAI-Beta": "assistants=v1"}
```

2) Inline `resolve_request_path` into `build_url`

`resolve_request_path` currently just returns `self.default_path` and is only used in `create_request`. You can keep the flexibility by letting `build_url` accept an optional `options` argument and make it the single path hook.

Base class:

```python
class ChatProvider(ABC):
    default_path: str = "/chat/completions"

    def build_url(self, options: ChatRequestOptions | None = None) -> str:
        # Default behavior – subclasses can override if they need different paths
        rel = self.default_path
        if not rel.startswith("/"):
            rel = f"/{rel}"
        return f"{self.base_url}{rel}"
```

Usage in `create_request`:

```python
def create_request(self, messages: Sequence[ChatMessage], options: ChatRequestOptions) -> urllib_request.Request:
    payload = self.build_payload(messages, options)
    headers = self.prepare_headers(options=options)
    body = self.serialize_payload(payload)
    return urllib_request.Request(
        self.build_url(options),
        data=body,
        headers=headers,
        method="POST",
    )
```

If a provider needs to change paths based on options later, it overrides `build_url(options)`:

```python
class AzureChatProvider(ChatProvider):
    def build_url(self, options: ChatRequestOptions | None = None) -> str:
        deployment = getattr(options, "deployment", "default")
        rel = f"/openai/deployments/{deployment}/chat/completions"
        return f"{self.base_url}{rel}"
```

This keeps all current behavior and extensibility, but reduces the number of indirection points a reader has to mentally track for “how do we get from options → headers → URL → request?”.
</issue_to_address>

### Comment 9
<location> `ramalama/chat_providers/openai.py:80` </location>
<code_context>
+        return payload
+
+
+class OpenAIHostedChatProvider(OpenAIChatProvider):
+    def build_payload(self, messages: Sequence[ChatMessage], options: ChatRequestOptions) -> dict[str, object]:
+        payload = super().build_payload(messages, options)
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying the hosted provider payload handling and optionally inlining delta extraction logic to reduce branching and helper indirection.

You can simplify a couple of spots without changing behavior:

1) Avoid the extra `max_completion_tokens` mutation/potential KeyError

You can make the `OpenAIHostedChatProvider.build_payload` logic more direct by only setting `max_completion_tokens` when it has a meaningful value. This removes the second conditional and avoids relying on the key being present:

```python
class OpenAIHostedChatProvider(OpenAIChatProvider):
    def build_payload(
        self,
        messages: Sequence[ChatMessage],
        options: ChatRequestOptions,
    ) -> dict[str, object]:
        payload = super().build_payload(messages, options)

        max_tokens = payload.pop("max_tokens", None)
        # Only include when explicitly set and non-zero
        if max_tokens not in (None, 0):
            payload["max_completion_tokens"] = max_tokens

        return payload
```

This keeps the existing semantics:
- `max_tokens` unset → not present in payload
- `max_tokens == 0` → treated as “no limit” / removed
- `max_tokens > 0` → sent as `max_completion_tokens`  
but reduces branching and the temporary inconsistent state where `max_completion_tokens` is set only to be removed.

2) Inline `_extract_delta` to reduce indirection (optional)

Given that this provider is OpenAI-specific and the JSON shape is fixed, inlining `_extract_delta` into `parse_stream_chunk` can make the flow easier to follow without losing robustness:

Current:

```python
delta = self._extract_delta(parsed)
if delta:
    events.append(ChatStreamEvent(text=delta, raw=parsed))
```

Simplified inline version:

```python
try:
    parsed = json.loads(payload)
except json.JSONDecodeError:
    continue

choices = parsed.get("choices")
if (
    isinstance(choices, list) and choices
    and isinstance(choices[0], Mapping)
):
    delta = choices[0].get("delta")
    if isinstance(delta, Mapping):
        content = delta.get("content")
        if isinstance(content, str):
            events.append(ChatStreamEvent(text=content, raw=parsed))
```

This removes one layer of helper indirection and keeps all the defensive checks, while making it clear that you only care about the first choice’s `delta.content` for now.
</issue_to_address>

### Comment 10
<location> `test/unit/test_file_loader_integration.py:340-343` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 11
<location> `test/unit/test_file_loader_integration.py:374-379` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid conditionals in tests. ([`no-conditionals-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-conditionals-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like conditionals, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 12
<location> `test/unit/test_file_loader_integration.py:440-443` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Avoid loops in tests. ([`no-loop-in-tests`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/Python/Default-Rules/no-loop-in-tests))

<details><summary>Explanation</summary>Avoid complex code, like loops, in test functions.

Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
* loops
* conditionals

Some ways to fix this:

* Use parametrized tests to get rid of the loop.
* Move the complex logic into helpers.
* Move the complex part into pytest fixtures.

> Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / [Don't Put Logic in Tests](https://abseil.io/resources/swe-book/html/ch12.html#donapostrophet_put_logic_in_tests)
</details>
</issue_to_address>

### Comment 13
<location> `ramalama/chat.py:152-154` </location>
<code_context>
    def _format_message_for_summary(self, msg: ChatMessage) -> str:
        text_parts = [part.text for part in msg.parts if isinstance(part, TextPart)]
        if not text_parts:
            return msg.role
        return f"{msg.role}: {' '.join(text_parts)}"

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return msg.role if not text_parts else f"{msg.role}: {' '.join(text_parts)}"
```
</issue_to_address>

### Comment 14
<location> `ramalama/chat_providers/base.py:33-34` </location>
<code_context>
    def to_dict(self) -> dict[str, Any]:
        keys = ["model", "temperature", "max_tokens", "stream"]
        result = {k: v for k in keys if (v := getattr(self, k)) is not None}
        result |= {} if self.extra is None else dict(self.extra)
        return result

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge dictionary updates via the union operator ([`dict-assign-update-to-union`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/dict-assign-update-to-union/))

```suggestion
        result = {k: v for k in keys if (v := getattr(self, k)) is not None} | (
            {} if self.extra is None else dict(self.extra)
        )
```
</issue_to_address>

### Comment 15
<location> `ramalama/chat_providers/base.py:86` </location>
<code_context>
    def prepare_headers(
        self,
        *,
        include_auth: bool = True,
        extra: Mapping[str, str] | None = None,
        options: ChatRequestOptions | None = None,
    ) -> dict[str, str]:
        headers: dict[str, str] = {"Content-Type": "application/json"}
        headers.update(self._default_headers)
        headers.update(self.provider_headers(options))
        if include_auth:
            headers.update(self.auth_headers())
        if extra:
            headers.update(extra)
        return headers

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge dictionary updates via the union operator ([`dict-assign-update-to-union`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/dict-assign-update-to-union/))

```suggestion
        headers |= self._default_headers
```
</issue_to_address>

### Comment 16
<location> `ramalama/chat_providers/base.py:95-97` </location>
<code_context>
    def auth_headers(self) -> Mapping[str, str]:
        if not self.api_key:
            return {}
        return {"Authorization": f"Bearer {self.api_key}"}

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return {} if not self.api_key else {"Authorization": f"Bearer {self.api_key}"}
```
</issue_to_address>

### Comment 17
<location> `ramalama/chat_providers/base.py:150-152` </location>
<code_context>
    def parse_response_body(self, body: bytes) -> Any:
        if not body:
            return None
        return json.loads(body.decode("utf-8"))

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Lift code into else after jump in control flow ([`reintroduce-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/reintroduce-else/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
        return None if not body else json.loads(body.decode("utf-8"))
```
</issue_to_address>

### Comment 18
<location> `ramalama/chat_providers/openai.py:50-51` </location>
<code_context>
    def parse_stream_chunk(self, chunk: bytes) -> Iterable[ChatStreamEvent]:
        events: list[ChatStreamEvent] = []
        self._stream_buffer += chunk.decode("utf-8")

        while "\n\n" in self._stream_buffer:
            raw_event, self._stream_buffer = self._stream_buffer.split("\n\n", 1)
            raw_event = raw_event.strip()
            if not raw_event:
                continue
            for line in raw_event.splitlines():
                if not line.startswith("data:"):
                    continue
                payload = line[len("data:") :].strip()
                if not payload:
                    continue
                if payload == "[DONE]":
                    events.append(ChatStreamEvent(done=True))
                    continue
                try:
                    parsed = json.loads(payload)
                except json.JSONDecodeError:
                    continue
                delta = self._extract_delta(parsed)
                if delta:
                    events.append(ChatStreamEvent(text=delta, raw=parsed))

        return events

</code_context>

<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))

```suggestion
                if delta := self._extract_delta(parsed):
```
</issue_to_address>

### Comment 19
<location> `ramalama/chat_providers/openai.py:71-76` </location>
<code_context>
    def _serialize_message(self, message: ChatMessage) -> dict[str, Any]:
        payload: dict[str, Any] = {"role": message.role}
        if message.parts:
            payload["content"] = [serialize_part(part) for part in message.parts]
        else:
            payload["content"] = ""
        payload.update(message.metadata)
        return payload

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Merge dictionary assignment with declaration ([`merge-dict-assign`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/merge-dict-assign/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))
- Merge dictionary updates via the union operator ([`dict-assign-update-to-union`](https://docs.sourcery.ai/Reference/Default-Rules/suggestions/dict-assign-update-to-union/))

```suggestion
        payload: dict[str, Any] = {
            "role": message.role,
            "content": (
                [serialize_part(part) for part in message.parts]
                if message.parts
                else ""
            ),
        }
        payload |= message.metadata
```
</issue_to_address>

### Comment 20
<location> `ramalama/chat_utils.py:115-117` </location>
<code_context>
def default_prefix() -> str:
    if not EMOJI:
        return "> "

    if CONFIG.prefix:
        return CONFIG.prefix

    engine = CONFIG.engine

    if engine:
        if os.path.basename(engine) == "podman":
            return "🦭 > "

        if os.path.basename(engine) == "docker":
            return "🐋 > "

    return "🦙 > "

</code_context>

<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))

```suggestion
    if engine := CONFIG.engine:
```
</issue_to_address>

### Comment 21
<location> `ramalama/cli.py:1147-1150` </location>
<code_context>
def run_cli(args):
    try:
        # detect available port and update arguments
        args.port = compute_serving_port(args)

        model = New(args.MODEL, args)
        model.ensure_model_exists(args)
    except KeyError as e:
        logger.debug(e)
        try:
            args.quiet = True
            model = TransportFactory(args.MODEL, args, ignore_stderr=True).create_oci()
            model.ensure_model_exists(args)
        except Exception as exc:
            raise e from exc

    is_api_transport = isinstance(model, APITransport)

    if args.rag and is_api_transport:
        raise ValueError("ramalama run --rag is not supported for hosted API transports.")

    if args.rag:
        if not args.container:
            raise ValueError("ramalama run --rag cannot be run with the --nocontainer option.")
        args = _rag_args(args)
        model = RagTransport(model, assemble_command(args.model_args), args)
        model.ensure_model_exists(args)

    server_cmd: list[str] = []
    if not isinstance(model, APITransport):
        server_cmd = assemble_command(args)

    model.run(args, server_cmd)

</code_context>

<issue_to_address>
**suggestion (code-quality):** We've found these issues:

- Move setting of default value for variable into `else` branch ([`introduce-default-else`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/introduce-default-else/))
- Swap if/else branches ([`swap-if-else-branches`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/swap-if-else-branches/))
- Replace if statement with if expression ([`assign-if-exp`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/assign-if-exp/))

```suggestion
    server_cmd = [] if isinstance(model, APITransport) else assemble_command(args)
```
</issue_to_address>

### Comment 22
<location> `ramalama/transports/api.py:68` </location>
<code_context>
    def run(self, args, server_cmd: list[str]):  # pragma: no cover - exercised via CLI integration
        """Connect directly to the provider instead of launching a local server."""
        args.container = False
        args.engine = None
        args.url = self.base_url
        if not getattr(args, "api_key", None):
            key = self._resolve_api_key()
            if key:
                args.api_key = key
        args.model = self.model_name
        args.api = self.provider.scheme
        provider = self._build_chat_provider(args)
        chat(args, provider=provider)
        return 0

</code_context>

<issue_to_address>
**issue (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))
</issue_to_address>

### Comment 23
<location> `ramalama/transports/api.py:99-100` </location>
<code_context>
    def _resolve_api_key(self) -> str | None:
        resolver = PROVIDER_API_KEY_RESOLVERS.get(self.provider.scheme)
        if resolver:
            return resolver()
        return CONFIG.api_key

</code_context>

<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))

```suggestion
        if resolver := PROVIDER_API_KEY_RESOLVERS.get(self.provider.scheme):
```
</issue_to_address>

### Comment 24
<location> `ramalama/transports/transport_factory.py:51-52` </location>
<code_context>
    def detect_model_model_type(self) -> tuple[type[CLASS_MODEL_TYPES], Callable[[], CLASS_MODEL_TYPES]]:
        api_spec = self._match_api_provider()
        if api_spec:
            self._api_provider_spec = api_spec
            return APITransport, self.create_api_transport

        for prefix in ["huggingface://", "hf://", "hf.co/"]:
            if self.model.startswith(prefix):
                return Huggingface, self.create_huggingface
        for prefix in ["modelscope://", "ms://"]:
            if self.model.startswith(prefix):
                return ModelScope, self.create_modelscope
        for prefix in ["ollama://", "ollama.com/library/"]:
            if self.model.startswith(prefix):
                return Ollama, self.create_ollama
        for prefix in ["oci://", "docker://"]:
            if self.model.startswith(prefix):
                return OCI, self.create_oci
        if self.model.startswith("rlcr://"):
            return RamalamaContainerRegistry, self.create_rlcr
        for prefix in ["http://", "https://", "file://"]:
            if self.model.startswith(prefix):
                return URL, self.create_url
        if self.transport == "huggingface":
            return Huggingface, self.create_huggingface
        if self.transport == "modelscope":
            return ModelScope, self.create_modelscope
        if self.transport == "ollama":
            return Ollama, self.create_ollama
        if self.transport == "rlcr":
            return RamalamaContainerRegistry, self.create_rlcr
        if self.transport == "oci":
            return OCI, self.create_oci

        raise KeyError(f'transport "{self.transport}" not supported. Must be oci, huggingface, modelscope, or ollama.')

</code_context>

<issue_to_address>
**suggestion (code-quality):** Use named expression to simplify assignment and conditional ([`use-named-expression`](https://docs.sourcery.ai/Reference/Default-Rules/refactorings/use-named-expression/))

```suggestion
        if api_spec := self._match_api_provider():
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-designed architectural improvement by abstracting chat providers. This decouples the core logic from specific API implementations and improves maintainability. The transition to structured ChatMessage objects is a great step for type safety and code clarity. The new APITransport for hosted models is also a valuable addition, and it's well-integrated into the CLI. The accompanying test updates and additions are thorough. I've identified a few potential issues related to error handling and edge cases that could improve the robustness of the new implementation, but overall, this is an excellent contribution.

@ieaves ieaves changed the title adds abstracted providers and api calling Add Provider Abstraction with support for Hosted API Calls Nov 26, 2025
@ieaves ieaves force-pushed the feat/chat-providers branch from 30cad54 to a3b4047 Compare December 2, 2025 06:06
@ieaves
Copy link
Collaborator Author

ieaves commented Dec 2, 2025

@rhatdan would you be willing to take a look at this? The primary goal was supporting multiple API backends with different chat implementations but core improvements include:

  • Generic chat provider interfaces to provide compatibility with different API interfaces beyond static /v1/chat/completions
  • Implementation of the Responses interface with an example implementation of model calling to a remote API (openai, in this case).
  • A new Spinner class which spins GUI updates off into a separate thread. The previous implementation coupled the loading animations to api calls
  • Updated documentation and associated tests

Signed-off-by: Ian Eaves <[email protected]>
Signed-off-by: Ian Eaves <[email protected]>
Signed-off-by: Ian Eaves <[email protected]>
Signed-off-by: Ian Eaves <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented Dec 3, 2025

@olliewalsh PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants