Skip to content

fix: release interning state before terminal output to fix hash aggregate regression (#23178)#23180

Closed
bhonepyisone wants to merge 1 commit into
apache:mainfrom
bhonepyisone:fix/hash-aggregate-output-regression
Closed

fix: release interning state before terminal output to fix hash aggregate regression (#23178)#23180
bhonepyisone wants to merge 1 commit into
apache:mainfrom
bhonepyisone:fix/hash-aggregate-output-regression

Conversation

@bhonepyisone

Copy link
Copy Markdown

What

Fixes the hash aggregate output performance regression introduced by #23055.

Problem

PR #23055 introduced EmitTo::First(batch_size) for incremental hash aggregate output. During the terminal output phase (Outputting state), no more intern() calls happen, but emit(EmitTo::First(n)) still maintained the hash lookup state on every output batch:

Implementation Wasted work per batch
GroupValuesColumn (multi-col) map.retain() with complex index-list manipulation
GroupValuesPrimitive (single-col) map.retain() shifting all remaining indexes
GroupValuesRows (fallback) Rebuilds rows + map.retain()

This caused O(groups × output_batches) unnecessary work.

Performance Impact

On TPC-DS SF10 query 23 with 24 target partitions:

The problematic node spent ~32s in emitting_time vs ~3s in actual aggregation time.

Fix

Added release_interning_state() to the GroupValues trait, called once during the Building → Outputting state transition in start_outputting(). It clears the hash map and interning buffers, making retain a no-op during terminal output.

The group values themselves are stored separately (in values / group_values / column builders) and are not cleared, so output data is preserved.

Files Changed

  • group_values/mod.rs — Added release_interning_state() to the GroupValues trait
  • aggregate_hash_table/common.rs — Call it in start_outputting()
  • multi_group_by/mod.rs — Clear map, hashes, group-index lists (the main hot path)
  • row.rs — Clear map, hashes, row buffers
  • single_group_by/primitive.rs — Clear map
  • single_group_by/boolean.rs — No-op (no hash map)
  • single_group_by/bytes.rs — No-op (map rebuilt on emit)
  • single_group_by/bytes_view.rs — No-op (map rebuilt on emit)

Closes #23178

…gate regression

Closes apache#23178

PR apache#23055 introduced EmitTo::First(batch_size) for incremental hash
aggregate output. During the terminal output phase (Outputting state),
no more intern() calls happen, but emit(EmitTo::First(n)) still
maintained the hash lookup state on every output batch:

- GroupValuesColumn: map.retain() with complex index-list manipulation
- GroupValuesPrimitive: map.retain() shifting all remaining indexes
- GroupValuesRows: rebuilds rows + map.retain()

This caused O(groups * output_batches) unnecessary work, resulting in
a ~3.3x performance regression (2.43s -> 8.04s on TPC-DS Q23).

Fix: add release_interning_state() to the GroupValues trait, called
once during the Building -> Outputting state transition. It clears the
hash map and interning buffers, making retain a no-op during terminal
output. The group values themselves are stored separately and are not
cleared, so output data is preserved.

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label Jun 25, 2026
@2010YOUY01

Copy link
Copy Markdown
Contributor

Fixed in #23182

@2010YOUY01 2010YOUY01 closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HashAggregate output regression after #23055: repeated EmitTo::First maintains unused GroupValues lookup state

2 participants