Skip to content

FIX: Make RenderContext parent-queries answer is_a? semantics#42

Merged
gschlager merged 1 commit into
mainfrom
fix/render-context-subclass-support
May 12, 2026
Merged

FIX: Make RenderContext parent-queries answer is_a? semantics#42
gschlager merged 1 commit into
mainfrom
fix/render-context-subclass-support

Conversation

@gschlager
Copy link
Copy Markdown
Member

@gschlager gschlager commented May 12, 2026

Summary

Two RenderContext changes, second motivated by benchmarking the first:

1. Make parent-queries answer is_a? semantics. RenderContext#has_parent?, #find_parent, and #count_parents were keyed on exact class via parent_cache, so a user-defined subclass like class CustomUrl < AST::Url slipped past has_parent?(AST::Url). That's surprising — callers reasonably expect "do I have a Url ancestor" to include subclasses — and it makes built-in checks like the renderer's in_link_label? (which drives label-aware ] escaping) silently miss custom AST nodes. The three predicates now use is_a? semantics.

2. Drop the parent_cache. After (1), the cache was only earning its keep on has_parent?'s exact-class fast-path; find_parent and count_parents already had to scan @parents to answer polymorphic queries. Benchmarking showed the cache cost more than it saved — the linear scan over realistic parent depths (3–8) is faster than maintaining the hash. The class loses the parent_cache: kwarg, build_cache, and the incremental cache-update logic in with_parent.

Benchmarks

End-to-end (bench/bench.rb --isolated, YJIT, Ruby 3.4.8):

Report Baseline (with cache) No cache Delta
simple 36.8k/s 39.3k/s +6.7%
nested 38.3k/s 41.4k/s +8.2%
list 35.3k/s 38.4k/s +8.6%
table 11.3k/s 12.1k/s +6.9%
quote_nested 26.0k/s 27.7k/s +6.4%
code 83.7k/s 88.9k/s +6.1%
url 44.9k/s 47.0k/s +4.8%
mixed 13.1k/s 14.0k/s +6.7%
large_doc 744/s 797/s +7.2%
escape_plain 269.1k/s 268.6k/s -0.2% (noise; escaper-only path)
escape_mixed 5.65k/s 5.67k/s +0.3% (noise; escaper-only path)

Micro-bench (RenderContext push + query at depth 3 / 8):

  • with_parent push: 2.1× faster — no hash dup, no array allocation
  • has_parent? exact hit: -15% (43 ns vs 36 ns)
  • has_parent? miss: +9–12% faster — the key? probe was strictly extra work before the scan ran anyway
  • find_parent / count_parents: unchanged (already scanned)

Real workloads hit the miss path more than the hit path (has_parent?(Code) / (Url) / (Email) on every Text node — most text isn't inside those), so the miss-side win dominates. with_parent's allocation savings compound across every nested element.

Test plan

  • bundle exec rspec — 3122 examples, 0 failures
  • bin/mutant run on all RenderContext methods plus the now-dependent renderer/escape subjects — 100% mutation coverage (93/93 + 391/391 kills)
  • bin/lint — no offenses
  • Benchmarks above

Related

gschlager added a commit that referenced this pull request May 12, 2026
The cache existed to make has_parent? / find_parent / count_parents
O(1) instead of O(depth). After PR #42 it was only earning its keep
on has_parent?'s exact-class fast-path; find_parent and count_parents
already scan @Parents to answer is_a? semantics.

Benchmarking shows the cache cost more than it saved at realistic
depths:

End-to-end (bench/bench.rb --isolated, YJIT, Ruby 3.4.8):

  report         before     after    delta
  simple         36.8k/s    39.3k/s  +6.7%
  nested         38.3k/s    41.4k/s  +8.2%
  list           35.3k/s    38.4k/s  +8.6%
  table          11.3k/s    12.1k/s  +6.9%
  quote_nested   26.0k/s    27.7k/s  +6.4%
  code           83.7k/s    88.9k/s  +6.1%
  url            44.9k/s    47.0k/s  +4.8%
  mixed          13.1k/s    14.0k/s  +6.7%
  large_doc       744/s      797/s   +7.2%

(escape_plain / escape_mixed unchanged — they don't touch
RenderContext.)

Micro (RenderContext push + query, depth 3 / 8):

  - with_parent push:       2.1× faster (no hash dup, no array alloc)
  - has_parent? exact hit:  -15% (43 ns vs 36 ns)
  - has_parent? miss:       +9–12% (the key? probe was strictly extra
                            work before the scan ran anyway)
  - find_parent / count_parents: unchanged (already scanned)

Real workloads hit the miss path more than the hit path
(has_parent?(Code) / (Url) / (Email) on every Text node — most
text isn't inside those), so the miss-side win dominates.
with_parent's allocation savings compound across every nested
element. The two effects together produce the 5–8% E2E gain.

The class is also significantly simpler: no parent_cache: kwarg,
no build_cache, no incremental cache update in with_parent, no
documentation of the cache's fast-path semantics.
@gschlager gschlager closed this May 12, 2026
@gschlager gschlager force-pushed the fix/render-context-subclass-support branch from 07da9f0 to a8945bc Compare May 12, 2026 17:13
`#has_parent?`, `#find_parent`, and `#count_parents` were keyed on
exact class via `parent_cache`, so a user-defined subclass like
`class CustomUrl < AST::Url` slipped past `has_parent?(AST::Url)`.
That's surprising — callers reasonably expect "do I have a Url
ancestor" to include subclasses — and it makes built-in checks
like Renderer's in_link_label? (which drives label-aware ]
escaping) silently miss custom AST nodes.

Fix: the three predicates now use `is_a?` semantics.

While verifying that the cache could still earn its keep on the
has_parent? fast-path, benchmarking showed it didn't: at realistic
depths (3–8) the linear scan is faster than maintaining the hash.
Dropped the cache entirely. The class loses parent_cache:,
build_cache, and the incremental cache-update logic in with_parent.

End-to-end (bench/bench.rb --isolated, YJIT, Ruby 3.4.8):

  report         before     after    delta
  simple         36.8k/s    39.3k/s  +6.7%
  nested         38.3k/s    41.4k/s  +8.2%
  list           35.3k/s    38.4k/s  +8.6%
  table          11.3k/s    12.1k/s  +6.9%
  quote_nested   26.0k/s    27.7k/s  +6.4%
  code           83.7k/s    88.9k/s  +6.1%
  url            44.9k/s    47.0k/s  +4.8%
  mixed          13.1k/s    14.0k/s  +6.7%
  large_doc       744/s      797/s   +7.2%

(escape_plain / escape_mixed unchanged — they don't touch
RenderContext.)

Micro (RenderContext push + query, depth 3 / 8):

  - with_parent push:       2.1× faster (no hash dup, no array alloc)
  - has_parent? exact hit:  -15% (43 ns vs 36 ns)
  - has_parent? miss:       +9–12% (the key? probe was strictly extra
                            work before the scan ran anyway)
  - find_parent / count_parents: unchanged (already scanned)

Real workloads hit the miss path more than the hit path
(has_parent?(Code) / (Url) / (Email) on every Text node — most
text isn't inside those), so the miss-side win dominates.
with_parent's allocation savings compound across every nested
element. The two effects together produce the 5–8% E2E gain.
@gschlager gschlager reopened this May 12, 2026
@gschlager gschlager merged commit df26237 into main May 12, 2026
8 checks passed
@gschlager gschlager deleted the fix/render-context-subclass-support branch May 12, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant