Skip to content

fix: harden OpenAI attachment recovery#7004

Merged
zouyonghe merged 8 commits intoAstrBotDevs:masterfrom
zouyonghe:fix/openai-invalid-attachment
Mar 26, 2026
Merged

fix: harden OpenAI attachment recovery#7004
zouyonghe merged 8 commits intoAstrBotDevs:masterfrom
zouyonghe:fix/openai-invalid-attachment

Conversation

@zouyonghe
Copy link
Member

@zouyonghe zouyonghe commented Mar 26, 2026

Summary

  • materialize historical image_url context parts into data URLs without mutating caller-owned context payloads
  • retry INVALID_ATTACHMENT image failures in text-only mode and preserve original image parts when local materialization cannot prove they are stale
  • add regression coverage for HTTP, file URI, localhost file URI, Windows-style file URI, MIME detection, and invalid attachment fallback handling

Testing

  • uv run ruff check astrbot/core/provider/sources/openai_source.py tests/test_openai_source.py
  • uv run pytest tests/test_openai_source.py -q

Summary by Sourcery

Harden OpenAI image handling by materializing context images into data URLs without mutating caller payloads and by adding robust attachment error recovery.

Bug Fixes:

  • Treat INVALID_ATTACHMENT errors from OpenAI as retriable by falling back to text-only payloads when images are present, while preserving original image parts when materialization fails.
  • Prevent unintended mutation of caller-provided message contexts when preparing OpenAI chat payloads with image content.
  • Handle invalid or non-image local paths and file URIs gracefully when resolving image references.

Enhancements:

  • Introduce a unified image resolution pipeline that supports HTTP URLs, file URIs (including localhost and Windows-specific forms), base64 pseudo-URLs, and MIME type detection for constructing data URLs.
  • Refactor context image materialization to operate on copies of messages and only transform structured image parts, preserving text-only content and metadata.
  • Improve encoding helpers to validate image files, infer MIME types, and support both safe and strict error handling modes for image conversion.

Tests:

  • Add regression tests covering INVALID_ATTACHMENT handling, including text-only fallback behavior and non-retriable scenarios.
  • Add extensive tests for context image materialization across HTTP, file URI, localhost file URI, Windows-style file URI, base64 scheme, and invalid image inputs.
  • Add tests for file URI normalization, image resolution utilities, and strict vs safe image encoding behavior.

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Mar 26, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the OpenAI integration by improving the reliability of image attachment processing and error recovery. It addresses issues where image URLs might fail to load or be rejected by the API, by converting them into data URLs and implementing a fallback mechanism for 'INVALID_ATTACHMENT' errors. This ensures a more stable user experience when interacting with vision models, while also adding extensive test coverage for different image handling scenarios.

Highlights

  • Robust Invalid Attachment Handling: Implemented a mechanism to detect OpenAI's 'INVALID_ATTACHMENT' errors and automatically retry requests in text-only mode, preserving the original image parts if local materialization indicates they are not stale.
  • Image URL Materialization: Materialized historical 'image_url' context parts into data URLs, converting various URI schemes (HTTP, file, localhost) into base64-encoded images directly within the payload without mutating caller-owned context.
  • Comprehensive Test Coverage: Added extensive regression tests to cover HTTP, file URI, localhost file URI, Windows-style file URI parsing, MIME type detection, and the new invalid attachment fallback handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In _resolve_image_part the HTTP branch opens the image twice (once via _is_valid_image_file, once via encode_image_bs64, which itself re-opens the file for mime detection and encoding); consider consolidating validation/mime detection/encoding into a single helper to avoid duplicate I/O and simplify error handling.
  • The file:// and local-path branches in _resolve_image_part don’t perform the _is_valid_image_file check that the HTTP branch does; it may be worth aligning these paths so malformed or non-image files are handled consistently regardless of their origin.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_resolve_image_part` the HTTP branch opens the image twice (once via `_is_valid_image_file`, once via `encode_image_bs64`, which itself re-opens the file for mime detection and encoding); consider consolidating validation/mime detection/encoding into a single helper to avoid duplicate I/O and simplify error handling.
- The `file://` and local-path branches in `_resolve_image_part` don’t perform the `_is_valid_image_file` check that the HTTP branch does; it may be worth aligning these paths so malformed or non-image files are handled consistently regardless of their origin.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="196-203" />
<code_context>
+
+    @staticmethod
+    def _file_uri_to_path(file_uri: str) -> str:
+        parsed = urlparse(file_uri)
+        if parsed.scheme != "file":
+            return file_uri
+
+        path = unquote(parsed.path or "")
+        if re.match(r"^/[A-Za-z]:/", path):
+            path = path[1:]
+        if parsed.netloc and parsed.netloc != "localhost":
+            path = f"//{parsed.netloc}{path}"
+        return str(Path(path))
</code_context>
<issue_to_address>
**issue:** File-URI to path conversion may mishandle some Windows-style `file://` URIs.

For URIs like `file://C:/path/to/file` (emitted by some Windows tools), `urlparse` sets `netloc='C'` and `path='/path/to/file'`, so this code treats them as UNC and returns `//C/path/to/file` instead of `C:/path/to/file`. Because the drive-letter regex only runs on `parsed.path`, this case never hits that branch. Consider first normalizing `file://<drive_letter>:/...` to `C:/...` (e.g. by special-casing `scheme=='file'` with a single-letter `netloc` and `path` starting with `:/`) before applying the UNC handling.
</issue_to_address>

### Comment 2
<location path="astrbot/core/provider/sources/openai_source.py" line_range="141" />
<code_context>
                     return True
         return False

+    def _is_invalid_attachment_error(self, error: Exception) -> bool:
+        body = getattr(error, "body", None)
+        if isinstance(body, dict):
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting small helper functions for error parsing, image loading, and safe image opening to flatten nested logic and remove duplication while preserving behavior.

You can keep all new behavior while shaving off some of the incidental complexity by extracting tiny helpers where you already have duplication and deep nesting.

### 1. Flatten `_is_invalid_attachment_error`

You’re doing essentially the same matching twice (from `error.body` and from string candidates), and the body path is heavily nested. A small helper to normalize the error dict and a single list of predicates makes the method much flatter:

```python
def _get_error_info(self, error: Exception) -> tuple[str | None, str | None]:
    body = getattr(error, "body", None)
    if isinstance(body, dict):
        err_obj = body.get("error")
        if isinstance(err_obj, dict):
            code = err_obj.get("code")
            message = err_obj.get("message")
            if isinstance(code, str):
                code = code.lower()
            if isinstance(message, str):
                message = message.lower()
            return code, message
    return None, None

def _is_invalid_attachment_error(self, error: Exception) -> bool:
    code, message = self._get_error_info(error)
    candidates = [
        c.lower()
        for c in self._extract_error_text_candidates(error)
    ]
    if message:
        candidates.append(message)

    for text in candidates:
        if "invalid_attachment" in text:
            return True
        if "download attachment" in text and "404" in text:
            return True
    return code == "invalid_attachment"
```

This keeps all existing rules but removes nested `if isinstance(...)` chains and duplicate `any(...)` scans.

### 2. Split URL resolution from payload construction in `_resolve_image_part`

Right now `_resolve_image_part` both decides how to load the image and builds the final `{"type": "image_url"}` structure, plus you repeat `encode_image_bs64`/validation logic. A tiny loader helper keeps responsibilities separate:

```python
async def _load_image_data(self, image_url: str) -> str | None:
    if image_url.startswith("http"):
        image_path = await download_image_by_url(image_url)
        if not self._is_valid_image_file(image_path):
            logger.warning("图片 URL %s 下载结果不是有效图片,将忽略。", image_url)
            return None
    elif image_url.startswith("file://"):
        image_path = self._file_uri_to_path(image_url)
    else:
        image_path = image_url
    return await self.encode_image_bs64(image_path)

async def _resolve_image_part(
    self,
    image_url: str,
    *,
    image_detail: str | None = None,
) -> dict | None:
    if image_url.startswith("data:"):
        image_payload = {"url": image_url}
    else:
        image_data = await self._load_image_data(image_url)
        if not image_data:
            logger.warning("图片 %s 得到的结果为空,将忽略。", image_url)
            return None
        image_payload = {"url": image_data}

    if image_detail:
        image_payload["detail"] = image_detail
    return {"type": "image_url", "image_url": image_payload}
```

This keeps behavior identical (including `file://` handling and validity check) but flattens `_resolve_image_part` and removes duplicated branches.

### 3. Reduce duplication in `_is_valid_image_file` / `_detect_image_mime_type`

Both functions open the image with PIL and catch the same exceptions. Centralizing that into a safe opener helper reduces code and mental overhead:

```python
@staticmethod
def _safe_open_image(image_path: str) -> PILImage.Image | None:
    try:
        return PILImage.open(image_path)
    except (FileNotFoundError, OSError, UnidentifiedImageError):
        return None

@staticmethod
def _is_valid_image_file(image_path: str) -> bool:
    image = AstrBotOpenAIProvider._safe_open_image(image_path)
    if not image:
        return False
    try:
        image.verify()
    except Exception:
        return False
    return True

@staticmethod
def _detect_image_mime_type(image_path: str) -> str:
    image = AstrBotOpenAIProvider._safe_open_image(image_path)
    if not image:
        return "image/jpeg"
    image_format = str(image.format or "").upper()
    return {
        "JPEG": "image/jpeg",
        "PNG": "image/png",
        "GIF": "image/gif",
        "WEBP": "image/webp",
        "BMP": "image/bmp",
    }.get(image_format, "image/jpeg")
```

The external behavior (defaulting to `image/jpeg` and treating invalid files as invalid) stays the same, but the exception handling is no longer duplicated.

These focused extractions keep all of your new functionality (better MIME detection, `file://` handling, invalid-attachment fallback, and image pre-materialization) while making the code flatter and easier to reason about.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 26, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances image handling for OpenAI API calls by introducing a comprehensive mechanism to resolve various image URLs (HTTP, file, data URIs) into base64 encoded data within the chat context. Key changes include new utility functions for image validation, MIME type detection, and file URI to path conversion. It also implements robust error handling for 'invalid_attachment' errors, allowing the system to retry requests with text-only content if image processing fails. The review suggests an improvement to the _detect_image_mime_type method to use PILImage.MIME for more accurate and comprehensive MIME type detection, rather than a limited dictionary, to ensure better compatibility and correctness.

Comment on lines +187 to +193
"JPEG": "image/jpeg",
"PNG": "image/png",
"GIF": "image/gif",
"WEBP": "image/webp",
"BMP": "image/bmp",
}.get(image_format, "image/jpeg")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _detect_image_mime_type method uses a limited dictionary to map image formats to MIME types and defaults to image/jpeg for unrecognized formats. While _is_valid_image_file ensures the file is an image, using a more comprehensive approach like PILImage.MIME would provide more accurate MIME types for a wider range of image formats, improving correctness and compatibility with APIs that might be strict about MIME types. For example, a TIFF image would currently be incorrectly identified as image/jpeg.

        return PILImage.MIME.get(image_format, "image/jpeg")

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="1166-1167" />
<code_context>
-        with open(image_url, "rb") as f:
-            image_bs64 = base64.b64encode(f.read()).decode("utf-8")
-            return "data:image/jpeg;base64," + image_bs64
+        image_data = self._encode_image_file_to_data_url(image_url)
+        return image_data or ""

     async def terminate(self):
</code_context>
<issue_to_address>
**issue (bug_risk):** `encode_image_bs64` now silently returns an empty string on failure, which changes its error semantics and may hide issues.

This change shifts `encode_image_bs64` from raising on unreadable/missing files to returning `""` via `_encode_image_file_to_data_url` returning `None`. Callers that don’t explicitly treat an empty string as failure may proceed as if they have a valid image, especially if they previously relied on exceptions or `None` checks. Consider preserving the prior behavior (propagate the exception) or returning `None` instead of `""` so failures remain explicit and easier to handle correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new image helpers (_encode_image_file_to_data_url vs encode_image_bs64) now duplicate format-detection and MIME-type logic; consider having encode_image_bs64 delegate to _encode_image_file_to_data_url to avoid divergence in behavior over time.
  • _materialize_context_image_parts runs on every _prepare_chat_payload call even when no images are present; you could short-circuit with _context_contains_image to avoid the extra deep iteration on purely text-only conversations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new image helpers (`_encode_image_file_to_data_url` vs `encode_image_bs64`) now duplicate format-detection and MIME-type logic; consider having `encode_image_bs64` delegate to `_encode_image_file_to_data_url` to avoid divergence in behavior over time.
- `_materialize_context_image_parts` runs on every `_prepare_chat_payload` call even when no images are present; you could short-circuit with `_context_contains_image` to avoid the extra deep iteration on purely text-only conversations.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="1172-1175" />
<code_context>
-        with open(image_url, "rb") as f:
-            image_bs64 = base64.b64encode(f.read()).decode("utf-8")
-            return "data:image/jpeg;base64," + image_bs64
+        image_bytes = self._read_file_bytes(image_url, suppress_errors=False)
+        if image_bytes is None:
+            raise FileNotFoundError(image_url)
+
</code_context>
<issue_to_address>
**suggestion:** The `image_bytes is None` check in `encode_image_bs64` is unreachable with `suppress_errors=False` and can be simplified.

With `suppress_errors=False`, `_read_file_bytes` will either return bytes or raise `OSError`, never `None`, so the `if image_bytes is None:` branch is dead code. Either remove the `None` check and let `OSError` propagate, or call `_read_file_bytes` with `suppress_errors=True` and handle the missing-file case here, matching `_encode_image_file_to_data_url` for consistent error behavior.

```suggestion
        image_bytes = self._read_file_bytes(image_url, suppress_errors=False)
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The interaction between _load_image_data and encode_image_bs64 for base64:// URLs is a bit non-obvious (both now handle that scheme differently from other paths); consider centralizing this base64-handling logic in one place to avoid future regressions or accidental double-handling.
  • In _file_uri_to_path, the UNC/netloc handling (netloc not empty and not localhost leading to //{netloc}{path}) could produce surprising results on non-Windows platforms; it may be worth tightening or documenting the expected behavior for remote file://host/... URIs versus local paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The interaction between `_load_image_data` and `encode_image_bs64` for `base64://` URLs is a bit non-obvious (both now handle that scheme differently from other paths); consider centralizing this base64-handling logic in one place to avoid future regressions or accidental double-handling.
- In `_file_uri_to_path`, the UNC/netloc handling (`netloc` not empty and not `localhost` leading to `//{netloc}{path}`) could produce surprising results on non-Windows platforms; it may be worth tightening or documenting the expected behavior for remote `file://host/...` URIs versus local paths.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="187" />
<code_context>
+        }.get(str(image_format or "").upper(), "image/jpeg")
+
+    @staticmethod
+    def _read_file_bytes(
+        image_path: str,
+        *,
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the image handling and error-classification helpers by inlining small functions, centralizing URL dispatch, and extracting validation helpers to make the control flow more linear and easier to follow.

You can trim a fair amount of indirection without losing any behavior. The main opportunities are in image encoding/URL dispatch and error handling.

### 1. Collapse the image encoding helper chain

`_read_file_bytes` and `_detect_image_format` are only used in `_encode_image_file_to_data_url` and are very small. Inlining them makes the core flow easier to follow while preserving behavior:

```python
@classmethod
def _encode_image_file_to_data_url(
    cls,
    image_path: str,
    *,
    suppress_errors: bool = True,
    raise_on_invalid_image: bool = False,
) -> str | None:
    try:
        image_bytes = Path(image_path).read_bytes()
    except OSError:
        if not suppress_errors:
            raise
        return None

    try:
        with PILImage.open(BytesIO(image_bytes)) as image:
            image.verify()
            image_format = str(image.format or "").upper()
    except (OSError, UnidentifiedImageError):
        if raise_on_invalid_image:
            raise ValueError(f"Invalid image file: {image_path}")
        return None

    mime_type = cls._image_format_to_mime_type(image_format)
    image_bs64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_bs64}"
```

After this, you can delete `_read_file_bytes` and `_detect_image_format` without changing call sites.

### 2. Centralize URL-type dispatch

You currently have overlapping handling in `_normalize_image_path`, `_load_image_data`, `_resolve_image_part`, and `encode_image_bs64`. You can centralize the URL dispatch into a single helper and reuse it from both `_resolve_image_part` and `encode_image_bs64`:

```python
async def _image_ref_to_data_url(self, image_ref: str) -> str | None:
    if image_ref.startswith("base64://"):
        return image_ref.replace("base64://", "data:image/jpeg;base64,")

    if image_ref.startswith("http"):
        image_path = await download_image_by_url(image_ref)
    else:
        image_path = self._normalize_image_path(image_ref)

    return self._encode_image_file_to_data_url(image_path)
```

Then `_resolve_image_part` simplifies to:

```python
async def _resolve_image_part(
    self,
    image_url: str,
    *,
    image_detail: str | None = None,
) -> dict | None:
    if image_url.startswith("data:"):
        image_payload = {"url": image_url}
    else:
        image_data = await self._image_ref_to_data_url(image_url)
        if not image_data:
            logger.warning("图片 %s 得到的结果为空,将忽略。", image_url)
            return None
        image_payload = {"url": image_data}

    if image_detail:
        image_payload["detail"] = image_detail

    return {
        "type": "image_url",
        "image_url": image_payload,
    }
```

And `encode_image_bs64` can delegate to the same path (keeping its stricter error behavior):

```python
async def encode_image_bs64(self, image_url: str) -> str:
    if image_url.startswith("base64://"):
        return image_url.replace("base64://", "data:image/jpeg;base64,")

    image_path = self._normalize_image_path(image_url)
    image_data = self._encode_image_file_to_data_url(
        image_path,
        suppress_errors=False,
        raise_on_invalid_image=True,
    )
    return cast(str, image_data)
```

This keeps behavior but reduces duplicated decision trees about URL schemes.

### 3. Decompose `_materialize_context_image_parts`

The nested checks/logging and `_resolve_image_part` call can be flattened by extracting validation into a small helper:

```python
def _extract_image_part_info(
    self,
    part: dict,
) -> tuple[str | None, str | None]:
    if not isinstance(part, dict) or part.get("type") != "image_url":
        return None, None

    image_url_data = part.get("image_url")
    if not isinstance(image_url_data, dict):
        logger.warning("图片内容块格式无效,将保留原始内容。")
        return None, None

    url = image_url_data.get("url")
    if not isinstance(url, str) or not url:
        logger.warning("图片内容块缺少有效 URL,将保留原始内容。")
        return None, None

    detail = image_url_data.get("detail")
    return url, detail if isinstance(detail, str) else None
```

Then `_materialize_context_image_parts` becomes more linear:

```python
async def _materialize_context_image_parts(self, context_query: list[dict]) -> None:
    for message in context_query:
        content = message.get("content")
        if not isinstance(content, list):
            continue

        new_content: list[dict] = []
        content_changed = False

        for part in content:
            url, detail = self._extract_image_part_info(part)
            if not url:
                new_content.append(part)
                continue

            try:
                resolved_part = await self._resolve_image_part(url, image_detail=detail)
            except Exception as exc:
                logger.warning("图片 %s 预处理失败,将保留原始内容。错误: %s", url, exc)
                new_content.append(part)
                continue

            new_content.append(resolved_part or part)
            if resolved_part and resolved_part != part:
                content_changed = True

        if content_changed:
            message["content"] = new_content
```

Behavior stays the same (including logging), but the control flow is easier to follow.

### 4. Inline `_get_error_info` into `_is_invalid_attachment_error`

`_get_error_info` is only used once and is simple. Inlining it reduces hops when understanding the error classification:

```python
def _is_invalid_attachment_error(self, error: Exception) -> bool:
    body = getattr(error, "body", None)
    code = message = None
    if isinstance(body, dict):
        err_obj = body.get("error")
        if isinstance(err_obj, dict):
            raw_code = err_obj.get("code")
            raw_message = err_obj.get("message")
            code = raw_code.lower() if isinstance(raw_code, str) else None
            message = raw_message.lower() if isinstance(raw_message, str) else None

    candidates = [
        candidate.lower()
        for candidate in self._extract_error_text_candidates(error)
    ]
    if message:
        candidates.append(message)

    for candidate in candidates:
        if "invalid_attachment" in candidate:
            return True
        if "download attachment" in candidate and "404" in candidate:
            return True

    return code == "invalid_attachment"
```

This keeps all current matching behavior but makes it faster to see when an error is treated as `invalid_attachment`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="142" />
<code_context>
                     return True
         return False

+    def _is_invalid_attachment_error(self, error: Exception) -> bool:
+        body = getattr(error, "body", None)
+        code = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the new error/image helper logic to flatten conditionals, remove redundant helpers, and introduce clearer strict vs. tolerant APIs to reduce cognitive load while preserving behavior.

You can keep all the new functionality while shaving off some cognitive load with a few targeted refactors.

### 1. Flatten `_is_invalid_attachment_error`

You don’t really need the `candidates` list + helper call on the hot path. You can normalize once into a single string and check that:

```python
def _is_invalid_attachment_error(self, error: Exception) -> bool:
    body = getattr(error, "body", None)
    code: str | None = None
    message: str | None = None

    if isinstance(body, dict):
        err_obj = body.get("error")
        if isinstance(err_obj, dict):
            raw_code = err_obj.get("code")
            raw_message = err_obj.get("message")
            code = raw_code.lower() if isinstance(raw_code, str) else None
            message = raw_message.lower() if isinstance(raw_message, str) else None

    # unified, lowercased error text
    parts: list[str] = []
    if code:
        parts.append(code)
    if message:
        parts.append(message)
    parts.extend(map(str, self._extract_error_text_candidates(error)))
    parts.append(str(error))

    s = " ".join(p.lower() for p in parts if p)

    if "invalid_attachment" in s:
        return True
    if "download attachment" in s and "404" in s:
        return True

    # keep explicit code check for safety
    return code == "invalid_attachment"
```

This preserves your matching behavior but removes the extra list, loop, and branching.

---

### 2. Collapse image-path helpers

`_file_uri_to_path` + `_normalize_image_path` + part of `_image_ref_to_data_url` can be simplified by handling file/base64/http directly in `_image_ref_to_data_url`, and keeping path normalization local:

```python
async def _image_ref_to_data_url(
    self,
    image_ref: str,
    *,
    suppress_errors: bool = True,
    raise_on_invalid_image: bool = False,
) -> str | None:
    if image_ref.startswith("base64://"):
        return image_ref.replace("base64://", "data:image/jpeg;base64,")

    if image_ref.startswith("http"):
        image_path = await download_image_by_url(image_ref)
    elif image_ref.startswith("file://"):
        image_path = self._file_uri_to_path(image_ref)
    else:
        image_path = image_ref

    return self._encode_image_file_to_data_url(
        image_path,
        suppress_errors=suppress_errors,
        raise_on_invalid_image=raise_on_invalid_image,
    )
```

With that, `_normalize_image_path` becomes redundant and can be removed.

If you don’t truly need UNC/drive-letter support, `_file_uri_to_path` can also be trimmed:

```python
@staticmethod
def _file_uri_to_path(file_uri: str) -> str:
    parsed = urlparse(file_uri)
    if parsed.scheme != "file":
        return file_uri
    return str(Path(unquote(parsed.path or "")))
```

---

### 3. Inline `_image_format_to_mime_type`

The extra indirection for 4–5 types doesn’t buy much. Inlining the mapping keeps `_encode_image_file_to_data_url` self-contained:

```python
@classmethod
def _encode_image_file_to_data_url(
    cls,
    image_path: str,
    *,
    suppress_errors: bool = True,
    raise_on_invalid_image: bool = False,
) -> str | None:
    try:
        image_bytes = Path(image_path).read_bytes()
    except OSError:
        if not suppress_errors:
            raise
        return None

    try:
        with PILImage.open(BytesIO(image_bytes)) as image:
            image.verify()
            image_format = (image.format or "").upper()
    except (OSError, UnidentifiedImageError):
        if raise_on_invalid_image:
            raise ValueError(f"Invalid image file: {image_path}")
        return None

    mime_map = {
        "JPEG": "image/jpeg",
        "PNG": "image/png",
        "GIF": "image/gif",
        "WEBP": "image/webp",
        "BMP": "image/bmp",
    }
    mime_type = mime_map.get(image_format, "image/jpeg")

    image_bs64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_bs64}"
```

---

### 4. Simplify `_materialize_context_image_parts` (drop `content_changed`)

Given message lists are small, always reassigning `message["content"]` is simpler and cheap:

```python
async def _materialize_context_image_parts(self, context_query: list[dict]) -> None:
    for message in context_query:
        content = message.get("content")
        if not isinstance(content, list):
            continue

        new_content: list[dict] = []
        for part in content:
            url, image_detail = self._extract_image_part_info(part)
            if not url:
                new_content.append(part)
                continue

            try:
                resolved_part = await self._resolve_image_part(
                    url, image_detail=image_detail
                )
            except Exception as exc:
                logger.warning(
                    "图片 %s 预处理失败,将保留原始内容。错误: %s",
                    url,
                    exc,
                )
                new_content.append(part)
                continue

            new_content.append(resolved_part or part)

        message["content"] = new_content
```

This keeps behavior identical but removes the boolean tracking and conditional reassignment.

---

### 5. Reduce flag usage by exposing clearer APIs

You already have two distinct behaviors: strict (`encode_image_bs64`) and tolerant (`_materialize_context_image_parts`). Instead of threading flags everywhere, wrap `_image_ref_to_data_url` with mode-specific helpers:

```python
async def _image_ref_to_data_url_strict(self, image_ref: str) -> str:
    data = await self._image_ref_to_data_url(
        image_ref,
        suppress_errors=False,
        raise_on_invalid_image=True,
    )
    if data is None:
        raise RuntimeError(f"Failed to encode image data: {image_ref}")
    return data

async def _image_ref_to_data_url_safe(self, image_ref: str) -> str | None:
    return await self._image_ref_to_data_url(
        image_ref,
        suppress_errors=True,
        raise_on_invalid_image=False,
    )
```

Then:

```python
async def encode_image_bs64(self, image_url: str) -> str:
    return await self._image_ref_to_data_url_strict(image_url)
```

And `_resolve_image_part` can call `_image_ref_to_data_url_safe` instead of passing flags each time. This keeps functionality but makes the call sites easier to read.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="168" />
<code_context>
+            return True
+        return code == "invalid_attachment"
+
+    @classmethod
+    def _encode_image_file_to_data_url(
+        cls,
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the new image helper API and making context image materialization return a new structure instead of mutating in place to reduce branching and make the control flow easier to follow.

A few of the new helpers can be simplified without changing behavior, mainly around image conversion and context traversal.

### 1. Collapse the multiple image-ref helpers into one explicit mode

Right now you have:

- `_image_ref_to_data_url(image_ref, suppress_errors, raise_on_invalid_image)`
- `_image_ref_to_data_url_safe`
- `_image_ref_to_data_url_strict`
- `_encode_image_file_to_data_url` with `suppress_errors` and `raise_on_invalid_image`

This gives four behavioral combinations and requires understanding both flags and which wrapper is used at each call site.

You can keep the same semantics but make the API easier to reason about by using a single explicit mode instead of flag combinations, and by inlining the “safe/strict” wrappers into that function:

```python
# replace _image_ref_to_data_url + _image_ref_to_data_url_safe/_strict
from typing import Literal

class YourProvider(...):
    ...

    @classmethod
    def _encode_image_file_to_data_url(
        cls,
        image_path: str,
        *,
        mode: Literal["safe", "strict"],
    ) -> str | None:
        try:
            image_bytes = Path(image_path).read_bytes()
        except OSError:
            if mode == "strict":
                raise
            return None

        try:
            with PILImage.open(BytesIO(image_bytes)) as image:
                image.verify()
                image_format = str(image.format or "").upper()
        except (OSError, UnidentifiedImageError):
            if mode == "strict":
                raise ValueError(f"Invalid image file: {image_path}")
            return None

        mime_type = {
            "JPEG": "image/jpeg",
            "PNG": "image/png",
            "GIF": "image/gif",
            "WEBP": "image/webp",
            "BMP": "image/bmp",
        }.get(image_format, "image/jpeg")
        image_bs64 = base64.b64encode(image_bytes).decode("utf-8")
        return f"data:{mime_type};base64,{image_bs64}"

    async def _image_ref_to_data_url(
        self,
        image_ref: str,
        *,
        mode: Literal["safe", "strict"] = "safe",
    ) -> str | None:
        if image_ref.startswith("base64://"):
            return image_ref.replace("base64://", "data:image/jpeg;base64,")

        if image_ref.startswith("http"):
            image_path = await download_image_by_url(image_ref)
        elif image_ref.startswith("file://"):
            image_path = self._file_uri_to_path(image_ref)
        else:
            image_path = image_ref

        return self._encode_image_file_to_data_url(image_path, mode=mode)
```

Call sites then become explicit and easier to read, and you can drop `_image_ref_to_data_url_safe` / `_image_ref_to_data_url_strict`:

```python
async def _resolve_image_part(
    self,
    image_url: str,
    *,
    image_detail: str | None = None,
) -> dict | None:
    if image_url.startswith("data:"):
        image_payload = {"url": image_url}
    else:
        image_data = await self._image_ref_to_data_url(image_url, mode="safe")
        if not image_data:
            logger.warning("图片 %s 得到的结果为空,将忽略。", image_url)
            return None
        image_payload = {"url": image_data}

    if image_detail:
        image_payload["detail"] = image_detail
    return {
        "type": "image_url",
        "image_url": image_payload,
    }

async def encode_image_bs64(self, image_url: str) -> str:
    data = await self._image_ref_to_data_url(image_url, mode="strict")
    if data is None:
        # should not happen in strict mode, but keep guard if desired
        raise RuntimeError(f"Failed to encode image data: {image_url}")
    return data
```

This keeps functionality and error behavior but reduces branching and mental overhead around the different helper combinations.

---

### 2. Make the context image materialization a pure helper

`_materialize_context_image_parts` mutates `context_query` in place and is wired into `_prepare_chat_payload`, which already does a `deepcopy`. The control flow is:

- `assemble_context` / other callers → `_prepare_chat_payload`
- `copy.deepcopy(...)`
- `_context_contains_image(...)`
- `_materialize_context_image_parts(...)` (mutating `context_query`)

You can make the traversal easier to follow and test by making it a pure helper that returns a new structure, and keep the in-place update only at the top level:

```python
async def _materialize_context_image_parts(
    self, context_query: list[dict]
) -> list[dict]:
    new_messages: list[dict] = []
    for message in context_query:
        content = message.get("content")
        if not isinstance(content, list):
            new_messages.append(message)
            continue

        new_content: list[dict] = []
        for part in content:
            url, image_detail = self._extract_image_part_info(part)
            if not url:
                new_content.append(part)
                continue

            try:
                resolved_part = await self._resolve_image_part(
                    url, image_detail=image_detail
                )
            except Exception as exc:
                logger.warning(
                    "图片 %s 预处理失败,将保留原始内容。错误: %s",
                    url,
                    exc,
                )
                new_content.append(part)
                continue

            new_content.append(resolved_part or part)

        new_msg = dict(message)
        new_msg["content"] = new_content
        new_messages.append(new_msg)

    return new_messages
```

And in `_prepare_chat_payload` (or equivalent):

```python
context_query = copy.deepcopy(self._ensure_message_to_dicts(contexts))
...
if self._context_contains_image(context_query):
    context_query = await self._materialize_context_image_parts(context_query)
```

This keeps all current behavior (including fallbacks) but localizes the structural manipulation into a pure, testable function, making the provider’s main control flow simpler to read.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In _prepare_chat_payload you deep-copy contexts and then _materialize_context_image_parts deep-copies each message again; you can likely avoid one of these copies (for example by letting _materialize_context_image_parts operate on the already-copied list or by having it work without additional deep copies) to reduce overhead on large histories.
  • Inside _materialize_context_image_parts, each message is fully copy.deepcopy'd and then content is replaced with a separately-built list of deep-copied parts, which results in redundant copying of the content; consider shallow-copying the message and only deep-copying the non-image parts you preserve.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `_prepare_chat_payload` you deep-copy `contexts` and then `_materialize_context_image_parts` deep-copies each message again; you can likely avoid one of these copies (for example by letting `_materialize_context_image_parts` operate on the already-copied list or by having it work without additional deep copies) to reduce overhead on large histories.
- Inside `_materialize_context_image_parts`, each message is fully `copy.deepcopy`'d and then `content` is replaced with a separately-built list of deep-copied parts, which results in redundant copying of the content; consider shallow-copying the message and only deep-copying the non-image parts you preserve.

## Individual Comments

### Comment 1
<location path="astrbot/core/provider/sources/openai_source.py" line_range="142" />
<code_context>
                     return True
         return False

+    def _is_invalid_attachment_error(self, error: Exception) -> bool:
+        body = getattr(error, "body", None)
+        code: str | None = None
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the image-materialization pipeline and error-classification helper to flatten control flow, avoid deep copies, and clarify matching logic without changing behavior.

You can reduce complexity meaningfully in two focused spots without changing behavior.

### 1. Flatten `_materialize_context_image_parts` and avoid deepcopies

The current implementation does a lot of `copy.deepcopy` calls and nested branching. Since the function already treats `context_query` as an input → output transformer, you can:

* Introduce a small helper to transform a single message.
* Build new structures directly instead of deep-copying and mutating.
* Reuse non-image parts as-is, assuming they aren’t mutated later (they are replaced by the output list).

For example:

```python
async def _transform_content_part(self, part: dict) -> dict:
    url, image_detail = self._extract_image_part_info(part)
    if not url:
        return part  # non-image or invalid; keep original

    try:
        resolved_part = await self._resolve_image_part(url, image_detail=image_detail)
    except Exception as exc:
        logger.warning(
            "图片 %s 预处理失败,将保留原始内容。错误: %s",
            url,
            exc,
        )
        return part

    return resolved_part or part


async def _materialize_message_image_parts(self, message: dict) -> dict:
    content = message.get("content")
    if not isinstance(content, list):
        # no content list → just return a shallow copy
        return {**message}

    new_content = [
        await self._transform_content_part(part)
        for part in content
    ]
    return {**message, "content": new_content}


async def _materialize_context_image_parts(
    self,
    context_query: list[dict],
) -> list[dict]:
    return [
        await self._materialize_message_image_parts(message)
        for message in context_query
    ]
```

This:

* Eliminates repeated `copy.deepcopy` calls.
* Makes the control flow flatter (one place that decides “transform or keep original”).
* Highlights that this is a pure “map over messages” transformation.

If you still need defensive copying for some fields, you can selectively copy those inside `_materialize_message_image_parts` instead of deep-copying the entire structure.


### 2. Simplify `_is_invalid_attachment_error` with early returns

Current logic builds an aggregate `parts` list and then checks both `code` directly and via `error_text`, which is harder to reason about. You can keep the same matching behavior but reduce branching and make the priority clearer:

```python
def _is_invalid_attachment_error(self, error: Exception) -> bool:
    body = getattr(error, "body", None)
    code: str | None = None
    message: str | None = None

    if isinstance(body, dict):
        err = body.get("error")
        if isinstance(err, dict):
            raw_code = err.get("code")
            raw_message = err.get("message")
            code = raw_code.lower() if isinstance(raw_code, str) else None
            message = raw_message.lower() if isinstance(raw_message, str) else None

    # 1. Prefer explicit code if present
    if code == "invalid_attachment":
        return True

    # 2. Then inspect error message fields
    text_sources: list[str] = []
    if message:
        text_sources.append(message)
    if code:
        text_sources.append(code)
    text_sources.extend(map(str, self._extract_error_text_candidates(error)))

    error_text = " ".join(s.lower() for s in text_sources if s)
    if "invalid_attachment" in error_text:
        return True
    if "download attachment" in error_text and "404" in error_text:
        return True

    return False
```

This keeps all existing checks but:

* Avoids building `parts` when an explicit code is already sufficient.
* Removes the redundant final `return code == "invalid_attachment"` (now handled early).
* Makes the source priority (`code` → messages → generic text) explicit.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe
Copy link
Member Author

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The image materialization helpers (_materialize_message_image_parts / _materialize_context_image_parts) run all async work sequentially; if you expect multiple remote images per request, consider using asyncio.gather to resolve parts concurrently to avoid introducing latency bottlenecks.
  • In _image_ref_to_data_url, checking image_ref.startswith("http") and image_ref.startswith("file://") is fairly ad-hoc; using urlparse to distinguish schemes would make the URL handling more robust (including schemes like https, HTTP, or other valid URL forms).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The image materialization helpers (`_materialize_message_image_parts` / `_materialize_context_image_parts`) run all async work sequentially; if you expect multiple remote images per request, consider using `asyncio.gather` to resolve parts concurrently to avoid introducing latency bottlenecks.
- In `_image_ref_to_data_url`, checking `image_ref.startswith("http")` and `image_ref.startswith("file://")` is fairly ad-hoc; using `urlparse` to distinguish schemes would make the URL handling more robust (including schemes like `https`, `HTTP`, or other valid URL forms).

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@zouyonghe zouyonghe merged commit cd4e999 into AstrBotDevs:master Mar 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant