Summary
The two helper functions used by RawSessionEventData / SessionEventData (the "Data" shim that handles unknown event types from the CLI) are intended to convert JSON keys to Python attribute names and back. They are not inverses for common abbreviation patterns. Any unknown event with field names like userURL, sessionID, XMLPayload, serverIP, OAuthToken, etc. gets silently rewritten when it round-trips through Data.from_dict(...).to_dict().
This is a correctness bug, not a security issue.
Affected versions
main at commit dd2dcbc439256acfb9feb2cff07c0b9c820091b8. The helpers are auto-generated, so the same logic ships in every Python release built from this codegen.
Affected source
python/copilot/generated/session_events.py:202-216 — _compat_to_python_key and _compat_to_json_key
python/copilot/generated/session_events.py:239-253 — Data shim that exercises both helpers
- Reachable in production at
session_events.py:4536 (the case _: branch in SessionEvent.from_dict that wraps unknown event types in RawSessionEventData).
Reproduction
from copilot.generated.session_events import (
_compat_to_python_key,
_compat_to_json_key,
Data,
)
# 1) Helper-level: round-trip JSON key → Python → JSON should be a no-op
for k in ["userURL", "sessionID", "XMLPayload", "serverIP", "OAuthToken", "IPv6Address"]:
py = _compat_to_python_key(k)
back = _compat_to_json_key(py)
print(f" {k!r:<14} -> {py!r:<18} -> {back!r:<14} {'OK' if back == k else 'CORRUPTED'}")
# 2) Production-path: Data.from_dict(x).to_dict() should equal x
for k in ["userURL", "sessionID", "XMLPayload", "serverIP", "OAuthToken"]:
incoming = {k: 42}
via_data = Data.from_dict(incoming).to_dict()
print(f" {incoming!s:<28} -> {via_data!s:<28} {'OK' if via_data == incoming else 'KEY MUTATED'}")
Output:
'userURL' -> 'user_url' -> 'userUrl' CORRUPTED
'sessionID' -> 'session_id' -> 'sessionId' CORRUPTED
'XMLPayload' -> 'xml_payload' -> 'xmlPayload' CORRUPTED
'serverIP' -> 'server_ip' -> 'serverIp' CORRUPTED
'OAuthToken' -> 'o_auth_token' -> 'oAuthToken' CORRUPTED
'IPv6Address' -> 'i_pv6_address' -> 'iPv6Address' CORRUPTED
{'userURL': 42} -> {'userUrl': 42} KEY MUTATED
{'sessionID': 42} -> {'sessionId': 42} KEY MUTATED
{'XMLPayload': 42} -> {'xmlPayload': 42} KEY MUTATED
{'serverIP': 42} -> {'serverIp': 42} KEY MUTATED
{'OAuthToken': 42} -> {'oAuthToken': 42} KEY MUTATED
Common abbreviation patterns aren't the only problem — pure snake_case names with single-character segments also fail to round-trip. For instance 'a_a_a' (snake_case) becomes 'aAA' in JSON, then 'a_aa' (separator lost) when converted back.
Root cause
_compat_to_python_key (JSON → Python) inserts an underscore before an uppercase letter only when the previous character is not uppercase OR the next character is lowercase:
# session_events.py:202-209
def _compat_to_python_key(name: str) -> str:
normalized = name.replace(".", "_")
result: list[str] = []
for index, char in enumerate(normalized):
if char.isupper() and index > 0 and (
not normalized[index - 1].isupper()
or (index + 1 < len(normalized) and normalized[index + 1].islower())
):
result.append("_")
result.append(char.lower())
return "".join(result)
This collapses any uppercase run (URL, ID, IP) to a single segment in snake_case, losing the original casing.
_compat_to_json_key (Python → JSON) splits on _ and title-cases each part:
# session_events.py:212-216
def _compat_to_json_key(name: str) -> str:
parts = name.split("_")
if not parts:
return name
return parts[0] + "".join(part[:1].upper() + part[1:] for part in parts[1:])
When converting user_url back, it produces userUrl (only first letter uppercased per part) — different from the original userURL.
Impact
The Data shim is used for unknown event types — events the SDK doesn't have a hardcoded dataclass for. These come from:
- MCP servers emitting custom event types
- Newer Copilot CLI versions emitting events the installed SDK doesn't know about (forward compatibility)
- Custom Copilot extensions
Any consumer of unknown events that:
- Calls
event.data.to_dict() (e.g., to log/cache/replay) gets mutated keys
- Compares incoming vs outgoing event payloads gets false negatives
- Echoes events back to the CLI (some hook patterns do this) sends a different shape than received
Two distinct fields can also collide (e.g., userURL and userUrl both end up as userUrl after round-trip).
Summary
The two helper functions used by
RawSessionEventData/SessionEventData(the "Data" shim that handles unknown event types from the CLI) are intended to convert JSON keys to Python attribute names and back. They are not inverses for common abbreviation patterns. Any unknown event with field names likeuserURL,sessionID,XMLPayload,serverIP,OAuthToken, etc. gets silently rewritten when it round-trips throughData.from_dict(...).to_dict().This is a correctness bug, not a security issue.
Affected versions
mainat commitdd2dcbc439256acfb9feb2cff07c0b9c820091b8. The helpers are auto-generated, so the same logic ships in every Python release built from this codegen.Affected source
python/copilot/generated/session_events.py:202-216—_compat_to_python_keyand_compat_to_json_keypython/copilot/generated/session_events.py:239-253—Datashim that exercises both helperssession_events.py:4536(thecase _:branch inSessionEvent.from_dictthat wraps unknown event types inRawSessionEventData).Reproduction
Output:
Common abbreviation patterns aren't the only problem — pure snake_case names with single-character segments also fail to round-trip. For instance
'a_a_a'(snake_case) becomes'aAA'in JSON, then'a_aa'(separator lost) when converted back.Root cause
_compat_to_python_key(JSON → Python) inserts an underscore before an uppercase letter only when the previous character is not uppercase OR the next character is lowercase:This collapses any uppercase run (
URL,ID,IP) to a single segment in snake_case, losing the original casing._compat_to_json_key(Python → JSON) splits on_and title-cases each part:When converting
user_urlback, it producesuserUrl(only first letter uppercased per part) — different from the originaluserURL.Impact
The
Datashim is used for unknown event types — events the SDK doesn't have a hardcoded dataclass for. These come from:Any consumer of unknown events that:
event.data.to_dict()(e.g., to log/cache/replay) gets mutated keysTwo distinct fields can also collide (e.g.,
userURLanduserUrlboth end up asuserUrlafter round-trip).