Skip to content

Python codegen: _compat_to_python_key / _compat_to_json_key are not inverses for keys with common abbreviations (URL, ID, IP, XML, OAuth) #1138

@007bsd

Description

@007bsd

Summary

The two helper functions used by RawSessionEventData / SessionEventData (the "Data" shim that handles unknown event types from the CLI) are intended to convert JSON keys to Python attribute names and back. They are not inverses for common abbreviation patterns. Any unknown event with field names like userURL, sessionID, XMLPayload, serverIP, OAuthToken, etc. gets silently rewritten when it round-trips through Data.from_dict(...).to_dict().

This is a correctness bug, not a security issue.

Affected versions

main at commit dd2dcbc439256acfb9feb2cff07c0b9c820091b8. The helpers are auto-generated, so the same logic ships in every Python release built from this codegen.

Affected source

  • python/copilot/generated/session_events.py:202-216_compat_to_python_key and _compat_to_json_key
  • python/copilot/generated/session_events.py:239-253Data shim that exercises both helpers
  • Reachable in production at session_events.py:4536 (the case _: branch in SessionEvent.from_dict that wraps unknown event types in RawSessionEventData).

Reproduction

from copilot.generated.session_events import (
    _compat_to_python_key,
    _compat_to_json_key,
    Data,
)

# 1) Helper-level: round-trip JSON key → Python → JSON should be a no-op
for k in ["userURL", "sessionID", "XMLPayload", "serverIP", "OAuthToken", "IPv6Address"]:
    py = _compat_to_python_key(k)
    back = _compat_to_json_key(py)
    print(f"  {k!r:<14} -> {py!r:<18} -> {back!r:<14}  {'OK' if back == k else 'CORRUPTED'}")

# 2) Production-path: Data.from_dict(x).to_dict() should equal x
for k in ["userURL", "sessionID", "XMLPayload", "serverIP", "OAuthToken"]:
    incoming = {k: 42}
    via_data = Data.from_dict(incoming).to_dict()
    print(f"  {incoming!s:<28} -> {via_data!s:<28}  {'OK' if via_data == incoming else 'KEY MUTATED'}")

Output:

  'userURL'      -> 'user_url'         -> 'userUrl'      CORRUPTED
  'sessionID'    -> 'session_id'       -> 'sessionId'    CORRUPTED
  'XMLPayload'   -> 'xml_payload'      -> 'xmlPayload'   CORRUPTED
  'serverIP'     -> 'server_ip'        -> 'serverIp'     CORRUPTED
  'OAuthToken'   -> 'o_auth_token'     -> 'oAuthToken'   CORRUPTED
  'IPv6Address'  -> 'i_pv6_address'    -> 'iPv6Address'  CORRUPTED

  {'userURL': 42}              -> {'userUrl': 42}              KEY MUTATED
  {'sessionID': 42}            -> {'sessionId': 42}            KEY MUTATED
  {'XMLPayload': 42}           -> {'xmlPayload': 42}           KEY MUTATED
  {'serverIP': 42}             -> {'serverIp': 42}             KEY MUTATED
  {'OAuthToken': 42}           -> {'oAuthToken': 42}           KEY MUTATED

Common abbreviation patterns aren't the only problem — pure snake_case names with single-character segments also fail to round-trip. For instance 'a_a_a' (snake_case) becomes 'aAA' in JSON, then 'a_aa' (separator lost) when converted back.

Root cause

_compat_to_python_key (JSON → Python) inserts an underscore before an uppercase letter only when the previous character is not uppercase OR the next character is lowercase:

# session_events.py:202-209
def _compat_to_python_key(name: str) -> str:
    normalized = name.replace(".", "_")
    result: list[str] = []
    for index, char in enumerate(normalized):
        if char.isupper() and index > 0 and (
            not normalized[index - 1].isupper()
            or (index + 1 < len(normalized) and normalized[index + 1].islower())
        ):
            result.append("_")
        result.append(char.lower())
    return "".join(result)

This collapses any uppercase run (URL, ID, IP) to a single segment in snake_case, losing the original casing.

_compat_to_json_key (Python → JSON) splits on _ and title-cases each part:

# session_events.py:212-216
def _compat_to_json_key(name: str) -> str:
    parts = name.split("_")
    if not parts:
        return name
    return parts[0] + "".join(part[:1].upper() + part[1:] for part in parts[1:])

When converting user_url back, it produces userUrl (only first letter uppercased per part) — different from the original userURL.

Impact

The Data shim is used for unknown event types — events the SDK doesn't have a hardcoded dataclass for. These come from:

  • MCP servers emitting custom event types
  • Newer Copilot CLI versions emitting events the installed SDK doesn't know about (forward compatibility)
  • Custom Copilot extensions

Any consumer of unknown events that:

  • Calls event.data.to_dict() (e.g., to log/cache/replay) gets mutated keys
  • Compares incoming vs outgoing event payloads gets false negatives
  • Echoes events back to the CLI (some hook patterns do this) sends a different shape than received

Two distinct fields can also collide (e.g., userURL and userUrl both end up as userUrl after round-trip).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions