Skip to content

perf: Extend WindowTopN to support RANK#22885

Open
SubhamSinghal wants to merge 3 commits into
apache:mainfrom
SubhamSinghal:window-topn-rank
Open

perf: Extend WindowTopN to support RANK#22885
SubhamSinghal wants to merge 3 commits into
apache:mainfrom
SubhamSinghal:window-topn-rank

Conversation

@SubhamSinghal

@SubhamSinghal SubhamSinghal commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

PR #21479 introduced WindowTopN for ROW_NUMBER only; RANK and DENSE_RANK were explicitly out of scope. This PR extends the rule to RANK, replacing the full sort under Filter(rk≤K) → Window(RANK) → Sort with a per-partition heap-of-K plus a boundary-tie buffer.

What changes are included in this PR?

  • datafusion/physical-plan/src/topk/mod.rs — new pub(crate) struct PartitionedTopKRank (sibling of
    PartitionedTopK from perf: share encoder/reservation across PartitionedTopKExec partition … #23096) with per-partition RankPartitionState { TopKHeap, Vec<TieEntry> }.
  • datafusion/physical-plan/src/sorts/partitioned_topk.rsWindowFnKind enum (RowNumber / Rank).
    do_partitioned_topk dispatches on fn_kind to PartitionedTopK::try_new or PartitionedTopKRank::try_new;
  • datafusion/physical-optimizer/src/window_topn.rsis_row_numbersupported_window_fn(expr) -> Option<WindowFnKind>; empty-order_by guard for RANK; WindowFnKind plumbed through PartitionedTopKExec::try_new.
  • datafusion/sqllogictest/test_files/window_topn.slt — RANK SLT cases: basic, strict (<), flipped (>= / >),
    boundary ties, ties spanning ob values, empty-ORDER BY (rule must NOT fire), mixed window functions, ASC/DESC × NULLS FIRST/LAST, QUALIFY.
  • datafusion/core/tests/physical_optimizer/window_topn.rs — 6 new RANK rule unit tests covering predicate
    matching, partition-by/order-by guards, dense_rank skip.
  • benchmarks/queries/h2o/window.sql — six new RANK queries (Q14–Q17, Q22, Q23) covering partition counts from ~100 to ~100K, low and heavy tie densities.

h2o window benchmark, 10M-row large table, RANK top-2, 3-iteration average. Toggle via DATAFUSION_OPTIMIZER_ENABLE_WINDOW_TOPN.

Variant Partitions OFF (rule disabled) ON (rule enabled) Δ
RANK low ties (id3 % 100) ~100 305 ms 107 ms 2.84× faster
RANK low ties (id3 % 1000) ~1K 263 ms 120 ms 2.19× faster
RANK heavy ties (id3 % 1000, v2 % 10 OB) ~1K 282 ms 125 ms 2.25× faster
RANK low ties (id2) ~10K 363 ms 140 ms 2.59× faster
RANK heavy ties (id2, v2 % 10 OB) ~10K 291 ms 143 ms 2.04× faster
RANK low ties (id3 % 100K) ~100K 241 ms 422 ms 1.75× slower

Are these changes tested?

Yes:

  • cargo test -p datafusion-physical-plan --lib — 1455 passed
  • cargo test -p datafusion-physical-optimizer --lib — 27 passed
  • cargo test -p datafusion --test core_integration physical_optimizer::window_topn:: — 13 passed (7 ROW_NUMBER + 6 RANK)
  • cargo test --test sqllogictests -- window_topn — passed

Are there any user-facing changes?

The existing optimizer.enable_window_topn config flag (default false) now also covers RANK queries. No public API additions

@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate auto detected api change Auto detected API change labels Jun 10, 2026
@SubhamSinghal

Copy link
Copy Markdown
Contributor Author

@kumarUjjawal @2010YOUY01 will you be able to review this PR.

@comphead

Copy link
Copy Markdown
Contributor

might be related #23021

@github-actions github-actions Bot removed optimizer Optimizer rules core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 30, 2026
@github-actions github-actions Bot added optimizer Optimizer rules core Core DataFusion crate physical-plan Changes to the physical-plan crate and removed auto detected api change Auto detected API change labels Jun 30, 2026
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion v54.0.0 (current)
       Built [ 108.198s] (current)
     Parsing datafusion v54.0.0 (current)
      Parsed [   0.035s] (current)
    Building datafusion v54.0.0 (baseline)
       Built [ 109.654s] (baseline)
     Parsing datafusion v54.0.0 (baseline)
      Parsed [   0.035s] (baseline)
    Checking datafusion v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.558s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 219.973s] datafusion
    Building datafusion-physical-optimizer v54.0.0 (current)
       Built [  41.031s] (current)
     Parsing datafusion-physical-optimizer v54.0.0 (current)
      Parsed [   0.022s] (current)
    Building datafusion-physical-optimizer v54.0.0 (baseline)
       Built [  41.534s] (baseline)
     Parsing datafusion-physical-optimizer v54.0.0 (baseline)
      Parsed [   0.022s] (baseline)
    Checking datafusion-physical-optimizer v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.118s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  84.160s] datafusion-physical-optimizer
    Building datafusion-physical-plan v54.0.0 (current)
       Built [  38.274s] (current)
     Parsing datafusion-physical-plan v54.0.0 (current)
      Parsed [   0.134s] (current)
    Building datafusion-physical-plan v54.0.0 (baseline)
       Built [  38.585s] (baseline)
     Parsing datafusion-physical-plan v54.0.0 (baseline)
      Parsed [   0.134s] (baseline)
    Checking datafusion-physical-plan v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.607s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure method_parameter_count_changed: pub method parameter count changed ---

Description:
A publicly-visible method now takes a different number of parameters, not counting the receiver (self) parameter.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#fn-change-arity
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/method_parameter_count_changed.ron

Failed in:
  datafusion_physical_plan::sorts::partitioned_topk::PartitionedTopKExec::try_new takes 4 parameters in /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/24b9a8b1b9103b5df7d3ca5afd4287d7de221a0d/datafusion/physical-plan/src/sorts/partitioned_topk.rs:196, but now takes 5 parameters in /home/runner/work/datafusion/datafusion/datafusion/physical-plan/src/sorts/partitioned_topk.rs:225

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  79.358s] datafusion-physical-plan
    Building datafusion-sqllogictest v54.0.0 (current)
       Built [ 185.470s] (current)
     Parsing datafusion-sqllogictest v54.0.0 (current)
      Parsed [   0.026s] (current)
    Building datafusion-sqllogictest v54.0.0 (baseline)
       Built [ 182.375s] (baseline)
     Parsing datafusion-sqllogictest v54.0.0 (baseline)
      Parsed [   0.023s] (baseline)
    Checking datafusion-sqllogictest v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.083s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [ 371.586s] datafusion-sqllogictest

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change core Core DataFusion crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants