Skip to content

fix(install): make JSONC sanitizer string-aware (#553)#578

Open
dlwjdghkszkf-png wants to merge 1 commit into
tirth8205:mainfrom
dlwjdghkszkf-png:fix/553-jsonc-string-safe
Open

fix(install): make JSONC sanitizer string-aware (#553)#578
dlwjdghkszkf-png wants to merge 1 commit into
tirth8205:mainfrom
dlwjdghkszkf-png:fix/553-jsonc-string-safe

Conversation

@dlwjdghkszkf-png

Copy link
Copy Markdown

Fixes #553.

Problem

install_platform_configs reduces an existing JSONC config to strict JSON before merging, using two regexes that don't track string boundaries:

re.sub(r'//.*?$', '', raw, flags=re.MULTILINE)
re.sub(r',(\s*[}\]])', r'\1', stripped)

Both corrupt string values:

  • the // stripper truncates a "https://..." URL at the //
  • the trailing-comma stripper deletes a comma inside "foo, bar" when the next non-space char is } or ]

The damaged text then fails json.loads, or — worse — parses to silently wrong data, corrupting the user's config on write-back.

Fix

Replace both regexes with _strip_jsonc(): a single character-walking pass that tracks in-string state (honoring \ escapes) and only removes // + /* */ comments and trailing commas in structural position. Content inside string values is preserved verbatim. The now-unused import re is dropped.

Tests

New TestStripJsonc (10 cases): the #553 repro (comma inside a string before }), https:// URL preservation, escaped quotes, in-string comment markers ("x // y", "p /* q */ r"), and real comment / trailing-comma removal.

Validation:

  • pytest tests/test_skills.py146 passed (10 new)
  • mypy code_review_graph/skills.py → clean
  • ruff check code_review_graph/skills.py → clean

The config merge in install_platform_configs reduced JSONC to strict JSON
with two regexes that ignored string boundaries:

    re.sub(r'//.*?$', '', raw, flags=re.MULTILINE)
    re.sub(r',(\s*[}\]])', r'\1', stripped)

Both corrupt string *values*. The first truncates any "https://..." URL at
the "//"; the second deletes a comma inside "foo, bar" when the next
non-space char is "}" or "]". The damaged text then parses to silently
wrong data (or fails json.loads outright).

Replace both with _strip_jsonc(): a single character-walking pass that
tracks in-string state (honoring backslash escapes) and only removes "//"
and "/* */" comments and trailing commas in structural position. Content
inside string values is preserved verbatim. Drop the now-unused `import re`.

Add TestStripJsonc covering the tirth8205#553 repro, URL preservation, escaped
quotes, in-string comment markers, and real comment / trailing-comma
removal.
@itxaiohanglover

Copy link
Copy Markdown

This is solid — the character-by-character walk with string state tracking is the right approach. I had opened #571 with a simpler version that only handled trailing commas, but this covers both comments and commas in one pass. Just closed mine in favor of this.

The _skip_comment helper is clean. One edge case: unterminated /* block comments will run off the end — the idx + 2 could overshoot. Probably fine since it's best-effort, but worth a note.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(install): trailing-comma regex breaks on commas inside string values

2 participants