diff --git a/content/blog/2026-07-05-sort-pushdown.md b/content/blog/2026-07-05-sort-pushdown.md
new file mode 100644
index 00000000..14d00987
--- /dev/null
+++ b/content/blog/2026-07-05-sort-pushdown.md
@@ -0,0 +1,625 @@
+---
+layout: post
+title: Sort Pushdown in DataFusion: Skip Sorts, Skip Decode, Skip I/O
+date: 2026-07-05
+author: Qi Zhu
+categories: [performance]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+*Qi Zhu, [Massive](https://www.massive.com/)*
+
+**[Apache DataFusion] now automatically takes advantage of sortedness in the
+data — even when the data is only *partially* sorted, and even when
+DataFusion has not been told about the ordering ahead of time.** This post
+explains why that matters and walks through how DataFusion achieves it,
+through a combination of plan-time sort pushdown, runtime scan reordering,
+and mid-scan row-group pruning driven by [dynamic filters][dyn-filters-blog].
+
+[Apache DataFusion]: https://datafusion.apache.org/
+[dyn-filters-blog]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/
+
+## Why sort pushdown matters
+
+Many real datasets are at least partly sorted on disk:
+
+- Time-series files are written in ingestion-time order.
+- Event logs are sharded and sorted by event id.
+- Partitioned tables have a natural ordering by partition key.
+- Modern data lakes based on [Apache Iceberg] and similar formats
+  often have to work with data **as it was written** — resorting the
+  whole table isn't an option.
+
+But that "pre-existing sortedness" is only useful if the query engine can
+**notice** it and **use** it. Two common failure modes:
+
+1. The engine doesn't know about the ordering — the writer didn't set
+   Parquet `sorting_columns`, and the table definition doesn't include a
+   [`WITH ORDER`](https://datafusion.apache.org/user-guide/sql/ddl.html#create-external-table) clause.
+2. The engine knows the *per-file* ordering, but the file *listing* on
+   disk is in a different order, so global sortedness can't be proven at
+   plan time.
+
+In both cases, an `ORDER BY` or `ORDER BY ... LIMIT N` query pays the
+cost of a full external `SortExec` — a pipeline-blocking operator that
+must see every input row before emitting anything, dominating both
+latency and peak memory on large scans.
+
+Min/max statistics used for *predicate* pushdown are well-known and
+widely implemented across databases. Using them to *reason about sort
+order* — deleting redundant sorts, biasing scan order toward the
+most-promising data — is less common. This post is about how DataFusion
+does the latter.
+
+[Apache Iceberg]: https://iceberg.apache.org/
+
+## What DataFusion could already do — and what was missing
+
+DataFusion has always been able to skip the sort in the **exact** case,
+using the machinery covered in [@akurmustafa's earlier post on
+ordering analysis][ordering-analysis]: when the table definition
+declares an ordering (via `WITH ORDER` or Parquet `sorting_columns`)
+**and** the on-disk file listing already matches that order, the
+existing `EnsureRequirements` rule sees that the scan's
+`output_ordering` satisfies the request and **removes the redundant
+`SortExec`** entirely.
+
+This post is about **everything else** — the messier real-world cases
+where sortedness exists but isn't provable up front:
+
+- Files listed in the "wrong" order on disk (each file internally
+  sorted, but the listing doesn't match).
+- Declared ordering with **overlapping** ranges across files.
+- **No** declared ordering at all.
+- `ORDER BY ... DESC` on ASC-sorted data.
+
+Three complementary techniques close each gap:
+
+1. **Statistics-based sort elimination** (`Exact` path). Extend the
+   optimizer to prove ordering from min/max statistics after
+   reordering the file list, then delete the `SortExec` entirely.
+2. **Runtime scan reorder** (`Inexact` path). Keep the `SortExec`, but
+   bias scan order so the *most-promising* data is read first —
+   `TopK`'s [dynamic filter][dyn-filters-blog] tightens quickly and
+   downstream data is pruned by statistics before it's read.
+3. **Runtime row-group dynamic pruning** ([#22450]). Inside the
+   parquet decoder loop, re-check the live `TopK` threshold at every
+   row-group boundary and physically remove pruned row groups before
+   any bytes are fetched.
+
+Together these compose into a **three-layer pruning stack**
+(file-level, row-group-level, row-level), all driven by the same
+`TopK` dynamic filter. Headline results:
+
+- **Sort elimination**: 2×–49× faster on ASC-LIMIT queries where the
+  file list was in the wrong disk order.
+- **Runtime row-group pruning ([#22450])**: 5 of 11 `topk_tpch`
+  queries run 3–4× faster with zero regressions; total runtime drops
+  −44%.
+
+The rest of this post walks through each technique in turn.
+
+[#22450]: https://github.com/apache/datafusion/pull/22450
+[#20839]: https://github.com/apache/datafusion/pull/20839
+[Apache Parquet]: https://parquet.apache.org/
+[ordering-analysis]: https://datafusion.apache.org/blog/2025/03/11/ordering-analysis/
+
+## How DataFusion Tracks Ordering
+
+<img src="/blog/images/sort-pushdown/plan-diff.svg" alt="EXPLAIN before / after: SortExec eliminated once ordering is Exact" width="100%" class="img-fluid"/>
+
+DataFusion's [`FileScanConfig`](https://docs.rs/datafusion-datasource/latest/datafusion_datasource/file_scan_config/struct.FileScanConfig.html) carries an ordering claim for
+each scan's output, which is one of:
+
+- **`Exact`** — the optimizer is *certain* the output is in this order,
+  and removes redundant [`SortExec`](https://docs.rs/datafusion-physical-plan/latest/datafusion_physical_plan/sorts/sort/struct.SortExec.html) operators entirely.
+  `LIMIT N` becomes a static fetch on the source (the reader stops the
+  moment N rows are emitted).
+- **`Inexact`** — the optimizer believes the output is probably ordered
+  but cannot prove it. Downstream operators like
+  [`SortPreservingMergeExec`](https://docs.rs/datafusion-physical-plan/latest/datafusion_physical_plan/sorts/sort_preserving_merge/struct.SortPreservingMergeExec.html) can still benefit, but the
+  explicit `SortExec` stays for correctness. In this case `TopK`'s
+  [dynamic filter][dyn-filters-blog] tightens as the heap fills, and
+  data whose min/max cannot beat the threshold is pruned before it is
+  fully read.
+
+For example, given a query that returns the 10 most recent trades:
+
+```sql
+SELECT ts, symbol, amount FROM trades ORDER BY ts DESC LIMIT 10;
+```
+
+- With no ordering knowledge, DataFusion scans everything and uses a
+  `TopK` heap to keep the running best 10.
+- With **`Exact`** ordering, DataFusion drops the sort entirely and
+  stops reading after emitting 10 rows.
+- With **`Inexact`** ordering, the `SortExec` stays but scans start
+  from the most-promising data, so the `TopK` threshold tightens fast
+  and the rest is pruned by statistics.
+
+The optimizer rule that upgrades a scan from `Unsupported` to
+`Exact`/`Inexact` — and that removes the resulting redundant
+`SortExec` — is [`PushdownSort`](https://github.com/apache/datafusion/blob/main/datafusion/physical-optimizer/src/pushdown_sort.rs). `PushdownSort`
+runs late, after `EnsureRequirements` has finalised the plan shape.
+It walks each `SortExec`, asks the child leaf via `try_pushdown_sort`
+which flavour the source can produce, and rewrites accordingly.
+
+## The `Exact` Path · Sort Elimination via Statistics
+
+<img src="/blog/images/sort-pushdown/phase1-file-reorder.svg" alt="File reorder: rearranging files within a partition by min/max statistics so the file list is in range order" width="100%" class="img-fluid" /><br/>
+*Figure: file reorder by per-file `min/max` puts the file list in range
+order without touching file contents.*
+
+DataFusion could already recognize the *exact* sortedness case (declared
+ordering + matching on-disk file list). The new capability is recognizing
+sortedness when the **file list is in the wrong order** on disk, using
+the min/max statistics that the Parquet writer already stored per row
+group. Implemented across two PRs on `PushdownSort`:
+[apache/datafusion#19064][#19064] (rule scaffolding), and
+[apache/datafusion#21182][#21182] (stats-based file reorder).
+
+[#19064]: https://github.com/apache/datafusion/pull/19064
+[#21182]: https://github.com/apache/datafusion/pull/21182
+
+For example, consider three files `a.parquet`, `b.parquet`,
+`c.parquet`. Each is internally sorted by `ts` and declares
+`WITH ORDER (ts ASC)`, but they were written by different jobs and end
+up listed alphabetically on disk (which does *not* match sort order).
+The old machinery has no way to prove global sortedness, so an
+`ORDER BY ts` query pays for a full external sort even though the
+underlying data is already sorted.
+
+`PushdownSort` fixes this in three steps at the file-scan node:
+
+1. **Sort the file list by per-file `min`** on the sort column.
+2. **Check adjacency**: does `file[i].max ≤ file[i+1].min` hold for
+   every adjacent pair? If yes, the sorted file list produces a globally
+   sorted stream.
+3. **Upgrade the source's ordering claim to `Exact`** and remove the
+   surrounding `SortExec`.
+
+<img src="/blog/images/sort-pushdown/phase2-stats-overlap.svg" alt="Detecting non-overlapping ranges via min/max statistics" width="100%" class="img-fluid" /><br/>
+*Figure: after reorder, the left case has non-overlapping ranges (safe
+to upgrade to `Exact`); the right case has overlaps (upgrade skipped,
+falls through to the `Inexact` path).*
+
+Two conservative bail-outs: (a) sort keys must be plain columns
+(`ORDER BY date_trunc('hour', ts)` doesn't qualify — no per-file min/max
+for the function output), and (b) sort columns must be null-free, so
+`NULLS FIRST`/`NULLS LAST` semantics are preserved across file
+boundaries. The overlap case falls through to the `Inexact` path
+covered later.
+
+### `BufferExec` · a subtle multi-partition side effect
+
+<img src="/blog/images/sort-pushdown/buffer-exec-stall.svg" alt="SPM stalls when SortExec is removed in multi-partition plans" width="100%" class="img-fluid" /><br/>
+*Figure: removing the per-partition `SortExec` leaves the top-of-plan
+merge (`SortPreservingMergeExec`) directly consuming raw I/O; a stall
+on any partition stalls the whole plan.*
+
+Removing the `SortExec` looked like a pure win, but the first
+multi-partition benchmarks showed something counter-intuitive: **some
+queries got slower**. The root cause is that the removed `SortExec`
+was doing two jobs — sorting *and* implicitly buffering. Each
+per-partition `SortExec` runs as its own task, greedily draining its
+source in the background; the top-of-plan `SortPreservingMergeExec`
+picks from those large in-memory buffers and never blocks on I/O in
+any single partition.
+
+Once the `SortExec` is deleted, the merge sits directly on the raw
+parquet streams. It's a lazy consumer — a k-way merge must see the
+head row from every input before deciding which to emit. A stall in
+*any one* partition now stalls the entire merge.
+
+<img src="/blog/images/sort-pushdown/buffer-exec.svg" alt="BufferExec replaces the deleted SortExec with a bounded streaming buffer per partition" width="100%" class="img-fluid" /><br/>
+*Figure: `BufferExec` is inserted where the `SortExec` used to live —
+same greedy per-partition prefill, but no blocking sort.*
+
+The fix is [`BufferExec`](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/buffer.rs): a bounded per-partition
+prefill buffer that plays the same "greedy parallel I/O driver" role
+the `SortExec` implicitly did. No sort, no blocking, and strictly
+less memory than the `SortExec` it replaces. The capacity is bounded
+(default 1 GB, configurable via
+[`sort_pushdown_buffer_capacity`](https://github.com/apache/datafusion/pull/21426)) and grows via the
+global memory pool, so it back-pressures the source instead of
+OOMing.
+
+### Benchmark: `sort_pushdown` suite
+
+<img src="/blog/images/sort-pushdown/benchmark.svg" alt="Sort pushdown benchmark: 2x-49x speedup across four queries" width="100%" class="img-fluid" /><br/>
+*Figure: `sort_pushdown` results (`--partitions 1`, release build). ASC
+queries with the file list reversed against sort-key ranges.*
+
+Numbers below are the [`sort_pushdown`](https://github.com/apache/datafusion/tree/main/benchmarks/queries/sort_pushdown) suite,
+`--partitions 1`, versus `main`:
+
+| Query                                       | Before  | After   | Speedup  |
+| ------------------------------------------- | -------:| -------:| -------: |
+| Q1 — `ORDER BY key` (full scan)             | 259 ms  | 122 ms  | **2.1×** |
+| Q2 — `ORDER BY key LIMIT 100`               |  80 ms  |   3 ms  | **27×**  |
+| Q3 — `SELECT * ORDER BY key`                | 700 ms  | 313 ms  | **2.2×** |
+| Q4 — `SELECT * ORDER BY key LIMIT 100`      | 342 ms  |   7 ms  | **49×**  |
+
+- **Full-scan queries (Q1, Q3)** save the cost of the sort itself
+  (~½ end-to-end latency for in-memory sorts).
+- **`LIMIT` queries (Q2, Q4)** benefit dramatically because deleting
+  the `SortExec` turns `LIMIT N` into a **static fetch** on the source —
+  the reader stops after N rows. A 342 ms full-file scan collapses
+  into a 7 ms K-row read.
+
+## The `Inexact` Path · Runtime Reorder for `TopK` and `DESC`
+
+Stats-based sort elimination handles the `Exact` upgrade — strong
+correctness, sort elimination — but only when the table has a
+declared `output_ordering` *and* the files are provably
+non-overlapping after sorting by min. Three classes of queries
+fall outside that window:
+
+* **Unsorted data** — no `WITH ORDER`, no parquet `sorting_columns`.
+  The `Exact` upgrade cannot fire because there is no ordering
+  claim to upgrade.
+* **Overlapping ranges** — files written by different ingestion
+  jobs share time windows. The `Exact` upgrade keeps the `SortExec`
+  because the global ordering can't be proven, even though the
+  files often do contain large stretches of in-order data.
+* **`ORDER BY ... DESC` on ASC-sorted data** — flipping iteration
+  at the row-group level emits "RGs descending × rows ascending",
+  close to the requested order but not strictly DESC, so the
+  `SortExec` has to stay for correctness.
+
+For all three, a full external `SortExec` is overkill. The parquet
+metadata is right there, and reading the *most-promising* data
+first lets `TopK`'s dynamic filter threshold tighten quickly so the
+rest gets pruned. Runtime reorder wires that up by generalising
+the `Inexact` path the rule introduced.
+
+### When Inexact fires
+
+<img src="/blog/images/sort-pushdown/pr21956-decision.svg" alt="try_pushdown_sort decision tree: Exact, Inexact, or Unsupported" width="100%" class="img-fluid" /><br/>
+*Figure: for each `SortExec`, the leaf source returns `Exact` (drop
+the sort), `Inexact` (bias the scan and keep the sort), or
+`Unsupported`.*
+
+The Inexact verdict fires when either of two independent signals is
+true:
+
+- **Stats-based reorder available**: the leading sort key is a plain
+  column in the file schema, so the scan can sort files and row
+  groups by `min(col)` from Parquet statistics.
+- **Reverse satisfies the request**: the source's declared ordering,
+  when reversed, satisfies what the query asks for. This uses
+  DataFusion's [equivalence-properties][ordering-analysis] reasoning
+  and covers function monotonicity (`ts DESC` declared, `date_trunc('day', ts) ASC`
+  requested), constants inferred from filters, and multi-column
+  composite orderings.
+
+### How the scan reorders data
+
+<img src="/blog/images/sort-pushdown/pr21956-runtime-pipeline.svg" alt="Runtime reorder pipeline: file reorder, RG reorder, then optional reverse" width="100%" class="img-fluid" /><br/>
+*Figure: the parquet opener applies file-level reorder → row-group-level
+reorder → optional iteration reverse.*
+
+The parquet opener applies up to three composable steps at query start:
+
+1. **File-level reorder** — across a shared work-stealing queue, the
+   file list is sorted by `min(col)`, so the most-promising file is
+   picked first across all partitions.
+2. **Row-group-level reorder** — once a file is opened, its row groups
+   are sorted by `min(col)`.
+3. **Iteration reverse** — flip row-group iteration order for `DESC`
+   requests (and for the reverse-satisfies-the-request cases above).
+
+### File-level early stop already works
+
+<img src="/blog/images/sort-pushdown/desc_walk_file.png" alt="Tier 1 file-level reorder with early stop via file_pruner" width="100%" class="img-fluid" /><br/>
+*Figure: after file reorder, low-value files at the tail of the queue
+are cut by the file-level pruner before they are ever opened — no
+metadata I/O.*
+
+Once files are ordered "most-promising first", `TopK`'s heap fills
+quickly and its dynamic filter threshold tightens. Low-value files at
+the tail of the queue are then checked against the live threshold
+by the [`FilePruner`](https://github.com/apache/datafusion/blob/main/datafusion/pruning/src/file_pruner.rs) before they are ever opened —
+never loading their footer, page index, or any data.
+
+### Row-group-level: the gap [#22450] fills
+
+<img src="/blog/images/sort-pushdown/desc_walk_rg.png" alt="Tier 2 RG-level reorder — filter column still read for every RG pre-#22450" width="100%" class="img-fluid" /><br/>
+*Figure: inside a file, the first row group tightens the threshold —
+subsequent row groups have their projection columns short-circuited,
+but the filter column still has to be read to discover that no rows
+qualify.*
+
+Inside a file, the story is almost identical — but with one gap.
+After the first row group fills the heap, subsequent row groups
+whose values can't beat the threshold evaluate to an empty
+`RowSelection`, and arrow-rs's reader short-circuits: no projection
+columns fetched, no decompress, no decode.
+
+However, **the filter column still gets read for every row group**,
+because the dynamic filter has to be evaluated row-by-row to
+*discover* that no rows survive. On a large file with many row
+groups, that's a meaningful tax — most of which is redundant, since
+metadata alone could have proven the row group unwinnable. Closing
+that gap is what [#22450] does.
+
+## #22450 · Runtime Row-Group Dynamic Pruning
+
+The merge that just landed — [apache/datafusion#22450][#22450] —
+re-checks the dynamic filter **at every row-group boundary** inside
+an open file, converts the live threshold into a fresh
+`PruningPredicate`, and physically removes any row group whose
+min/max can't possibly beat the threshold. The pruned row groups are
+**never decoded, not even on the filter column**.
+
+### Architecture · who drives the IO + decode loop
+
+<img src="/blog/images/sort-pushdown/arch_one_glance.png" alt="Three eras of who drives the parquet IO + decode loop" width="100%" class="img-fluid"/>
+
+The interesting backstory is that **DataFusion didn't actually own
+this loop until recently**. Three eras:
+
+* **Pre-[#20839]**: arrow-rs owned the I/O + decode loop as a black
+  box; DataFusion only called `.next()` and served byte ranges. The
+  row-group list was frozen at construction, so once the loop started,
+  no mid-stream decisions were possible.
+* **[#20839]**: the push-based parquet decoder moved the loop into
+  DataFusion. The capability to insert a decision mid-loop now
+  existed — but the loop went from `drain` straight to `drive`, with
+  no decision point.
+* **[#22450]**: adds the missing decision point. At every row-group
+  boundary, the loop pauses to ask the runtime pruner whether the
+  remaining row groups are still worth reading.
+
+### The loop, and the decision point [#22450] adds
+
+<img src="/blog/images/sort-pushdown/transition_anatomy.png" alt="transition() loop: drain, decide, drive — Step 2 is the #22450 addition" width="100%" class="img-fluid" /><br/>
+*Figure: the decoder loop has three steps. Step 2 (DECIDE) is what
+[#22450] adds — it only fires at row-group boundaries.*
+
+The loop body reads: **drain** the current row group's batches until
+it's exhausted; **decide** at the boundary whether any of the
+remaining row groups can be dropped based on the live threshold; then
+**drive** the decoder into the next row group and repeat. Inside a
+row group, only drain and drive run — no decision point.
+
+<img src="/blog/images/sort-pushdown/pruner_loop.png" alt="RowGroupPruner: watch (cheap), rebuild (expensive, only if changed), prune (cheap)" width="100%" class="img-fluid" /><br/>
+*Figure: the pruner has a cheap "check if the filter changed" step, a
+moderately expensive "rebuild the predicate if so" step, and a cheap
+"apply the predicate to remaining row groups" step.*
+
+The pruner is designed so the expensive work only fires when it can
+possibly help: a cheap epoch check tells it whether the dynamic filter
+has actually changed since last time, and only then does it rebuild
+the pruning predicate. The predicate is then applied to remaining
+row groups' min/max statistics — pure metadata comparison, no I/O.
+Errors always fall back to "keep the row group" — a flaky pruner
+never drops live data.
+
+### Cascading prune · how the heap eats row groups
+
+<img src="/blog/images/sort-pushdown/rg_cascade.png" alt="Cascading prune: one row group fills the heap, threshold snaps, all subsequent row groups are pruned in a single pass" width="100%" class="img-fluid" /><br/>
+*Figure: for `ORDER BY x DESC LIMIT 10`, opening the first row group
+(values [90..100)) is enough to fill the heap; at the next boundary,
+every remaining row group with `max < 90` is pruned in one pass.*
+
+The savings compound because the threshold moves in **steps**, not
+smoothly. For `ORDER BY x DESC LIMIT 10` on a 10-row-group file
+where reorder puts high-value row groups first:
+
+1. RG 9 (values `[90..100)`) opens. One row group is enough to fill
+   the heap of size 10 — the threshold jumps into RG 9's range (≥ 90).
+2. At the next row-group boundary, the pruner sees that all of RG 8
+   through RG 0 have `max < 90` and drops them in one pass.
+3. Bytes for those nine row groups were **never fetched** — not
+   projection columns, not the filter column. Full I/O + decompress +
+   decode all skipped.
+
+This is the unconditional value of [#22450]: when reorder lines up
+disjoint per-RG ranges (the common case for time-series or
+partition-key sorts), a single row group can cascade-eliminate every
+remaining row group at the next boundary.
+
+## Three-Layer Pruning · file + RG + row, stacked
+
+<img src="/blog/images/sort-pushdown/pruning_stack.png" alt="Three-layer pruning: file-level, RG-level, row-level, all driven by the same TopK dynamic filter" width="100%" class="img-fluid"/>
+
+A common question at this point: "if [#22450] prunes whole row
+groups, does that replace the `RowFilter` row-level prune that the
+`Inexact` path was already using?" **No** — the three layers stack,
+and they're driven by the **same** `TopK` dynamic filter. (The
+"Tier 1 / Tier 2" framing earlier maps to "Layer 0 / Layer A"
+below — same partition, different lens. Layer B is what runs on
+each row group after Layer A keeps it.)
+
+* **Layer 0 · file-level** (`file_pruner` + `EarlyStoppingStream`).
+  Cuts dead files before they're opened. The only layer that skips
+  parquet metadata I/O entirely. Already shipped before [#22450] —
+  this is Tier 1.
+* **Layer A · row-group-level** ([#22450]). Cuts dead row groups
+  inside open files at every row-group boundary. Bytes never
+  fetched, filter column never decoded. **This is the layer that
+  fills the Tier 2 gap** ("× no early stop yet" pre-[#22450]).
+* **Layer B · row-level** (`RowFilter`). For row groups that
+  survive Layer A, the filter is still evaluated row-by-row to
+  build a `RowSelection`. Rows that fail the predicate get their
+  *projection* columns short-circuited via arrow-rs's
+  `selects_any()`, but the *filter* column is necessarily read.
+  This layer has the highest residual cost (the filter column),
+  but also the finest granularity.
+
+The same dynamic filter drives all three. A single insertion into
+the `TopK` heap becomes a new threshold that Layer B applies
+per-row immediately (in the currently-open row group), and Layer A
+re-applies to remaining row groups at the next boundary. No layer
+subsumes another — Layer A prunes on metadata alone (never touching
+the filter column), while Layer B is finer-grained but has to read
+the filter column to decide.
+
+### Benchmark · `topk_tpch` (TPC-H SF1, `LIMIT 100`)
+
+<img src="/blog/images/sort-pushdown/topk_tpch_bench.png" alt="topk_tpch benchmark results: 5 of 11 queries 3-4× faster, 0 regressions, total -44%" width="100%" class="img-fluid"/>
+
+The [`topk_tpch`](https://github.com/apache/datafusion/blob/main/benchmarks/src/sort_tpch.rs) benchmark runs 11 TPC-H SF1 queries, all of the
+shape `ORDER BY ... LIMIT 100`, comparing `main` against the same
+branch with [#22450] enabled. Headline numbers:
+
+| Metric                              | Value                              |
+| ----------------------------------- | ---------------------------------- |
+| Total wall-clock (sum of 11 queries) | 248.8 ms → 139.1 ms (**−44%**)    |
+| Queries with ≥2× speedup vs main    | **5 of 11** (Q2, Q4, Q8, Q9, Q10) |
+| Queries with regression vs main     | **0**                              |
+| Best single-query speedup           | **~4×**                            |
+
+The five queries with significant speedups all use `l_orderkey`
+as the **leading** sort key — lineitem's physical sort key, a
+`BIGINT` with ~1.5M distinct values per SF1, so per-RG `min/max`
+ranges are cleanly disjoint and `Layer A` can cascade-prune
+aggressively. The non-winners (Q1, Q3, Q5, Q6, Q7, Q11) lead with
+`l_linenumber` (cardinality 7), `l_comment`, or `l_shipmode` —
+columns whose per-RG ranges overlap heavily because they're not the
+physical sort order. (Q5–Q7 still *include* `l_orderkey`, but only
+as a third-key tie-breaker — the leading key is what controls RG-level
+disjointness.) A tighter threshold doesn't translate into clean
+RG-level boundaries to prune at, so `Layer B` (row-level) still does
+its share of the work.
+
+The takeaway isn't "5 out of 11", it's "**zero regressions and
+no-op when the data doesn't help, 3–4× when it does**". The sweet
+spot — sort key aligned with the physical layout — is the common
+case for time-series, partitioned tables, and ingestion-ordered
+event logs.
+
+## Future Directions
+
+Two complementary directions are open. The first needs an upstream
+arrow-rs primitive; the second is pure DataFusion plumbing on top
+of [#22450]:
+
+### A · Page-level `Exact` reverse · arrow-rs [#9937]
+
+<img src="/blog/images/sort-pushdown/reverse-scan.svg" alt="Row-group reverse (128 MB peak) vs page-level reverse (1 MB peak)" width="100%" class="img-fluid"/>
+
+[#9937]: https://github.com/apache/arrow-rs/pull/9937
+
+Today's `DESC` query support lives in the `Inexact` path: the
+row-group reverse emits "RGs descending × rows ascending", which is
+close to DESC but not strictly so. `SortExec` stays.
+
+A page-level reverse primitive in arrow-rs would let the reader
+walk the parquet offset index in reverse — decoding each page
+forward, reversing its `RecordBatch`, and emitting before moving to
+the previous page. Peak buffer drops from ~128 MB (one
+row group) to ~1 MB (one page); per-page decode stays forward (RLE,
+dictionary, delta, and bit-packed encodings are all forward-only
+by construction — page *traversal* is what gets reversed). Once
+each batch already comes out in DESC order, `PushdownSort` can
+finally return `Exact` for `DESC`, the `SortExec` is removed
+outright, and `LIMIT N` becomes a static fetch.
+
+In flight upstream as [arrow-rs#9937]. The killer use case is
+**filtered reverse `TopK`** — e.g. `WHERE user_id = 42 ORDER BY ts
+DESC LIMIT 10`. You can't pre-compute a `RowSelection::with_limit`
+because matching rows are sparse; the only correct strategy is to
+stream pages backward, filter, and stop when 10 matches accumulate.
+Row-group reverse stops at ~128 MB granularity; page reverse stops
+at ~1 MB.
+
+[arrow-rs#9937]: https://github.com/apache/arrow-rs/pull/9937
+
+### B · Page-level dynamic prune at the row-group boundary
+
+<img src="/blog/images/sort-pushdown/future_page_level.png" alt="Page-level dynamic prune: extends #22450 to skip individual pages, not just whole row groups" width="100%" class="img-fluid"/>
+
+[#22450] prunes whole row groups at row-group boundaries. The
+finer-grained extension prunes whole **pages** within a surviving
+row group. The signal is the same dynamic filter, just re-applied
+at page granularity — for any page whose `max(col)` is already
+below the threshold, the filter column's bytes for that page can be
+skipped along with the projection columns.
+
+Today's page-level pruning runs once at file open using the static
+query predicate. Future B extends [#22450]'s "refresh at RG
+boundary" pattern to also rebuild the page-level filter with the
+live threshold, so upcoming row groups get tighter page selections
+mid-scan. Same arrow-rs API [#22450] already uses — no new
+primitive needed. Tracked in [apache/datafusion#23216].
+
+[apache/datafusion#23216]: https://github.com/apache/datafusion/issues/23216
+
+Conceptually this is the same idea as [#22450] stepped down one
+level: every level of the parquet hierarchy gets to chip off its
+share of the residue from the level above.
+
+## Acknowledgements
+
+Thank you to [@adriangb], [@alamb], [@xudong963], [@2010YOUY01], and
+[@Dandandan] for reviewing the design and the patches across many
+iterations. The DataFusion community's willingness to engage deeply
+with optimizer changes — including the ones that touch foundational
+invariants like who-drives-the-decode-loop — is what made this work
+possible.
+
+[@alamb]: https://github.com/alamb
+[@adriangb]: https://github.com/adriangb
+[@xudong963]: https://github.com/xudong963
+[@2010YOUY01]: https://github.com/2010YOUY01
+[@Dandandan]: https://github.com/Dandandan
+
+## References
+
+Umbrella issue tracking the entire effort:
+
+* **[EPIC] Sort Pushdown · skip sorts and skip IO for ORDER BY / TopK queries: [apache/datafusion#23036](https://github.com/apache/datafusion/issues/23036)** — phase-by-phase status of all the PRs and follow-ups.
+
+Prior post this work builds on:
+
+* [Dynamic Filters: Passing Information Between Operators During Execution for 25x Faster Queries][dyn-filters-blog] — the dynamic filter primitive `TopK` uses.
+
+Landed PRs that make up the merged work:
+
+* `MinMaxStatistics` foundation: [apache/datafusion#9593](https://github.com/apache/datafusion/pull/9593)
+* `PushdownSort` rule + row-group reverse: [apache/datafusion#19064](https://github.com/apache/datafusion/pull/19064)
+* Reverse-output redesign: [apache/datafusion#19446](https://github.com/apache/datafusion/pull/19446), [apache/datafusion#19557](https://github.com/apache/datafusion/pull/19557)
+* Sort elimination via statistics: [apache/datafusion#21182](https://github.com/apache/datafusion/pull/21182)
+* `BufferExec` capacity for sort elimination: [apache/datafusion#21426](https://github.com/apache/datafusion/pull/21426)
+* Push-based parquet decoder (DataFusion owns the loop): [apache/datafusion#20839](https://github.com/apache/datafusion/pull/20839)
+* Morsel-style work scheduling: [apache/datafusion#21351](https://github.com/apache/datafusion/pull/21351)
+* Runtime reorder for `TopK` convergence: [apache/datafusion#21956](https://github.com/apache/datafusion/pull/21956)
+* **Runtime row-group dynamic pruning ([#22450])** — the centerpiece of this post.
+
+In flight / open:
+
+* Page-level reverse (arrow-rs): [apache/arrow-rs#9937](https://github.com/apache/arrow-rs/pull/9937), discussion in [apache/arrow-rs#9934](https://github.com/apache/arrow-rs/issues/9934)
+* `peek_next_row_group` API for per-RG `fully_matched` RowFilter skip (arrow-rs): [apache/arrow-rs#10158](https://github.com/apache/arrow-rs/pull/10158)
+* Page-level dynamic prune at RG boundary (Future B): [apache/datafusion#23216](https://github.com/apache/datafusion/issues/23216)
+* Per-RG `fully_matched` RowFilter skip on top of [#22450] (blocked on arrow-rs#10158): [apache/datafusion#23067](https://github.com/apache/datafusion/issues/23067)
+* Multi-column / function-wrapped stats reorder follow-ups: [apache/datafusion#22198](https://github.com/apache/datafusion/issues/22198)
+
+Concretely useful issues for new contributors:
+
+* [Add more `ExecutionPlan` impls to support sort pushdown][more-impls-issue].
+
+[more-impls-issue]: https://github.com/apache/datafusion/issues/19394
+
+Benchmark suites: [sort_pushdown](https://github.com/apache/datafusion/tree/main/benchmarks/queries/sort_pushdown), [topk_tpch](https://github.com/apache/datafusion/blob/main/benchmarks/src/sort_tpch.rs).
diff --git a/content/images/sort-pushdown/arch_one_glance.png b/content/images/sort-pushdown/arch_one_glance.png
new file mode 100644
index 00000000..7e8dafc3
Binary files /dev/null and b/content/images/sort-pushdown/arch_one_glance.png differ
diff --git a/content/images/sort-pushdown/benchmark.svg b/content/images/sort-pushdown/benchmark.svg
new file mode 100644
index 00000000..30afb7b2
--- /dev/null
+++ b/content/images/sort-pushdown/benchmark.svg
@@ -0,0 +1,75 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 380">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .axis-label { font-size: 11px; fill: #555; }
+    .qlabel { font-size: 12px; font-weight: 600; fill: #222; }
+    .qdesc { font-size: 10px; fill: #777; font-family: 'Courier New', monospace; }
+    .bartext-light { font-size: 10px; fill: #333; }
+    .bartext-on { font-size: 10px; font-weight: 700; fill: #fff; }
+    .speedup { font-size: 13px; font-weight: 700; fill: #27ae60; }
+    .legend { font-size: 11px; fill: #444; }
+  </style>
+
+  <text class="title" x="410" y="22" text-anchor="middle">sort_pushdown benchmark (single partition, release, reversed-name data)</text>
+
+  <!-- y-axis line + ticks (log-ish scale via manual placement) -->
+  <line x1="160" y1="60" x2="160" y2="320" stroke="#888" stroke-width="1"/>
+  <text class="axis-label" x="155" y="65" text-anchor="end">700ms</text>
+  <line x1="155" y1="62" x2="160" y2="62" stroke="#888"/>
+  <text class="axis-label" x="155" y="135" text-anchor="end">259ms</text>
+  <line x1="155" y1="132" x2="160" y2="132" stroke="#888"/>
+  <text class="axis-label" x="155" y="200" text-anchor="end">80ms</text>
+  <line x1="155" y1="197" x2="160" y2="197" stroke="#888"/>
+  <text class="axis-label" x="155" y="260" text-anchor="end">7ms</text>
+  <line x1="155" y1="257" x2="160" y2="257" stroke="#888"/>
+  <text class="axis-label" x="155" y="320" text-anchor="end">0</text>
+
+  <!-- bars per query: HEAD (gray) and PR (green), side-by-side -->
+  <!-- Q1: 259 -> 122, full ORDER BY -->
+  <text class="qlabel" x="225" y="345" text-anchor="middle">Q1</text>
+  <text class="qdesc" x="225" y="358" text-anchor="middle">ORDER BY full</text>
+  <rect x="180" y="135" width="40" height="185" fill="#bdc3c7"/>
+  <text class="bartext-on" x="200" y="155" text-anchor="middle">259</text>
+  <rect x="230" y="222" width="40" height="98" fill="#27ae60"/>
+  <text class="bartext-on" x="250" y="242" text-anchor="middle">122</text>
+  <text class="speedup" x="225" y="120" text-anchor="middle">2.1×</text>
+
+  <!-- Q2: 80 -> 3, ORDER BY LIMIT -->
+  <text class="qlabel" x="345" y="345" text-anchor="middle">Q2</text>
+  <text class="qdesc" x="345" y="358" text-anchor="middle">ORDER BY LIMIT</text>
+  <rect x="300" y="199" width="40" height="121" fill="#bdc3c7"/>
+  <text class="bartext-on" x="320" y="219" text-anchor="middle">80</text>
+  <rect x="350" y="313" width="40" height="7" fill="#27ae60"/>
+  <text class="bartext-light" x="370" y="313" text-anchor="start">3</text>
+  <text class="speedup" x="345" y="186" text-anchor="middle">27×</text>
+
+  <!-- Q3: 700 -> 313, SELECT * ORDER BY -->
+  <text class="qlabel" x="465" y="345" text-anchor="middle">Q3</text>
+  <text class="qdesc" x="465" y="358" text-anchor="middle">SELECT * ORDER BY</text>
+  <rect x="420" y="62" width="40" height="258" fill="#bdc3c7"/>
+  <text class="bartext-on" x="440" y="82" text-anchor="middle">700</text>
+  <rect x="470" y="174" width="40" height="146" fill="#27ae60"/>
+  <text class="bartext-on" x="490" y="194" text-anchor="middle">313</text>
+  <text class="speedup" x="465" y="50" text-anchor="middle">2.2×</text>
+
+  <!-- Q4: 342 -> 7, SELECT * LIMIT -->
+  <text class="qlabel" x="585" y="345" text-anchor="middle">Q4</text>
+  <text class="qdesc" x="585" y="358" text-anchor="middle">SELECT * ORDER BY LIMIT</text>
+  <rect x="540" y="98" width="40" height="222" fill="#bdc3c7"/>
+  <text class="bartext-on" x="560" y="118" text-anchor="middle">342</text>
+  <rect x="590" y="313" width="40" height="7" fill="#27ae60"/>
+  <text class="bartext-light" x="610" y="313" text-anchor="start">7</text>
+  <text class="speedup" x="585" y="85" text-anchor="middle">49×</text>
+
+  <!-- y-axis label -->
+  <text class="axis-label" x="78" y="190" text-anchor="middle" transform="rotate(-90 78 190)">latency (ms)</text>
+
+  <!-- legend -->
+  <rect x="660" y="80" width="14" height="14" fill="#bdc3c7"/>
+  <text class="legend" x="680" y="92">main (before)</text>
+  <rect x="660" y="105" width="14" height="14" fill="#27ae60"/>
+  <text class="legend" x="680" y="117">sort pushdown phase 2</text>
+  <text class="legend" x="660" y="145" fill="#777">Lower is better</text>
+  <text class="legend" x="660" y="160" fill="#777">--partitions 1, release</text>
+</svg>
diff --git a/content/images/sort-pushdown/buffer-exec-stall.svg b/content/images/sort-pushdown/buffer-exec-stall.svg
new file mode 100644
index 00000000..bbcc9a24
--- /dev/null
+++ b/content/images/sort-pushdown/buffer-exec-stall.svg
@@ -0,0 +1,160 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1080 460">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .header-bad { font-size: 18px; font-weight: 800; fill: #c0392b; }
+    .header-good { font-size: 18px; font-weight: 800; fill: #27ae60; }
+    .panel { font-size: 13px; font-weight: 700; fill: #1d1d1f; }
+    .lbl { font-size: 12px; fill: #555; }
+    .lbl-poll { font-size: 11px; fill: #c0392b; font-style: italic; }
+    .lbl-fast { font-size: 12px; fill: #27ae60; font-weight: 700; }
+    .small { font-size: 11px; fill: #666; font-family: 'Courier New', monospace; }
+    .stall { font-size: 13px; font-weight: 700; fill: #c0392b; }
+    .arr { fill: none; stroke: #555; stroke-width: 1.4; marker-end: url(#arr); }
+    .arr-good { fill: none; stroke: #27ae60; stroke-width: 1.6; marker-end: url(#arr-g); }
+    .arr-blocked { fill: none; stroke: #c0392b; stroke-width: 1.4; stroke-dasharray: 4,3; marker-end: url(#arr-r); }
+    .arr-prefill { fill: none; stroke: #c0392b; stroke-width: 1.2; stroke-dasharray: 3,3; marker-end: url(#arr-r); }
+  </style>
+  <defs>
+    <marker id="arr" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+    <marker id="arr-g" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#27ae60"/>
+    </marker>
+    <marker id="arr-r" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#c0392b"/>
+    </marker>
+  </defs>
+
+  <!-- ============ LEFT PANEL: WITHOUT BufferExec ============ -->
+  <text class="header-bad" x="270" y="30" text-anchor="middle">✗ Without BufferExec</text>
+  <text class="lbl" x="270" y="50" text-anchor="middle">SPM polls I/O directly — k-way merge stalls on any slow partition</text>
+
+  <!-- Source side -->
+  <rect x="20" y="70" width="120" height="320" rx="6" fill="#fafafa" stroke="#bbb" stroke-width="1"/>
+  <text class="panel" x="80" y="92" text-anchor="middle">DataSource</text>
+  <text class="small" x="80" y="108" text-anchor="middle">(I/O bound)</text>
+
+  <!-- P0: waiting on I/O -->
+  <rect x="160" y="110" width="220" height="46" rx="4" fill="#fff5f5" stroke="#c0392b" stroke-width="1.5"/>
+  <text class="lbl" x="172" y="130">Partition 0</text>
+  <text class="stall" x="370" y="138" text-anchor="end">⏳ I/O in flight</text>
+  <text class="small" x="370" y="152" text-anchor="end">no batch ready</text>
+
+  <!-- P1: ready -->
+  <rect x="160" y="170" width="220" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="172" y="190">Partition 1</text>
+  <rect x="290" y="180" width="20" height="26" fill="#5b8def"/>
+  <text class="small" x="370" y="200" text-anchor="end">ready</text>
+
+  <!-- P2: ready -->
+  <rect x="160" y="230" width="220" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="172" y="250">Partition 2</text>
+  <rect x="290" y="240" width="20" height="26" fill="#5b8def"/>
+  <text class="small" x="370" y="260" text-anchor="end">ready</text>
+
+  <!-- PN: ready -->
+  <rect x="160" y="290" width="220" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="172" y="310">Partition N</text>
+  <rect x="290" y="300" width="20" height="26" fill="#5b8def"/>
+  <text class="small" x="370" y="320" text-anchor="end">ready</text>
+
+  <!-- Arrows source → partitions -->
+  <line class="arr" x1="140" y1="133" x2="158" y2="133"/>
+  <line class="arr" x1="140" y1="193" x2="158" y2="193"/>
+  <line class="arr" x1="140" y1="253" x2="158" y2="253"/>
+  <line class="arr" x1="140" y1="313" x2="158" y2="313"/>
+
+  <!-- Blocked arrows partitions → SPM -->
+  <line class="arr-blocked" x1="380" y1="133" x2="430" y2="190"/>
+  <line class="arr-blocked" x1="380" y1="193" x2="430" y2="195"/>
+  <line class="arr-blocked" x1="380" y1="253" x2="430" y2="220"/>
+  <line class="arr-blocked" x1="380" y1="313" x2="430" y2="245"/>
+
+  <!-- SPM blocked -->
+  <rect x="430" y="170" width="90" height="100" rx="6" fill="#fff5f5" stroke="#c0392b" stroke-width="1.8"/>
+  <text class="panel" x="475" y="194" text-anchor="middle" fill="#c0392b">SPM</text>
+  <text class="small" x="475" y="212" text-anchor="middle">k-way merge</text>
+  <text class="stall" x="475" y="234" text-anchor="middle">STALLED</text>
+  <text class="small" x="475" y="252" text-anchor="middle">waits for P0</text>
+
+  <!-- Output: nothing -->
+  <text class="stall" x="475" y="300" text-anchor="middle">↓</text>
+  <text class="stall" x="475" y="320" text-anchor="middle">no progress</text>
+
+  <!-- Divider -->
+  <line x1="555" y1="20" x2="555" y2="440" stroke="#ddd" stroke-width="1.5" stroke-dasharray="2,4"/>
+
+  <!-- ============ RIGHT PANEL: WITH BufferExec ============ -->
+  <text class="header-good" x="820" y="30" text-anchor="middle">✓ With BufferExec</text>
+  <text class="lbl" x="820" y="50" text-anchor="middle">Background prefill — SPM always has rows in hand</text>
+
+  <!-- Source side -->
+  <rect x="575" y="70" width="100" height="320" rx="6" fill="#fafafa" stroke="#bbb" stroke-width="1"/>
+  <text class="panel" x="625" y="92" text-anchor="middle">DataSource</text>
+  <text class="small" x="625" y="108" text-anchor="middle">(I/O bound)</text>
+
+  <!-- BufferExec container -->
+  <rect x="700" y="78" width="280" height="304" rx="6" fill="#fff" stroke="#5b8def" stroke-width="1.8"/>
+  <text class="panel" x="840" y="98" text-anchor="middle" fill="#5b8def">BufferExec (bounded)</text>
+
+  <!-- P0 with queue -->
+  <rect x="715" y="110" width="250" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="728" y="130">Partition 0</text>
+  <rect x="820" y="120" width="18" height="26" fill="#5b8def"/>
+  <rect x="842" y="120" width="18" height="26" fill="#5b8def"/>
+  <rect x="864" y="120" width="18" height="26" fill="#5b8def"/>
+  <rect x="886" y="120" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="908" y="120" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <text class="lbl-fast" x="960" y="140" text-anchor="end">ready</text>
+
+  <!-- P1 -->
+  <rect x="715" y="170" width="250" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="728" y="190">Partition 1</text>
+  <rect x="820" y="180" width="18" height="26" fill="#5b8def"/>
+  <rect x="842" y="180" width="18" height="26" fill="#5b8def"/>
+  <rect x="864" y="180" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="886" y="180" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="908" y="180" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <text class="lbl-fast" x="960" y="200" text-anchor="end">ready</text>
+
+  <!-- P2 -->
+  <rect x="715" y="230" width="250" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="728" y="250">Partition 2</text>
+  <rect x="820" y="240" width="18" height="26" fill="#5b8def"/>
+  <rect x="842" y="240" width="18" height="26" fill="#5b8def"/>
+  <rect x="864" y="240" width="18" height="26" fill="#5b8def"/>
+  <rect x="886" y="240" width="18" height="26" fill="#5b8def"/>
+  <rect x="908" y="240" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <text class="lbl-fast" x="960" y="260" text-anchor="end">ready</text>
+
+  <!-- PN -->
+  <rect x="715" y="290" width="250" height="46" rx="4" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="728" y="310">Partition N</text>
+  <rect x="820" y="300" width="18" height="26" fill="#5b8def"/>
+  <rect x="842" y="300" width="18" height="26" fill="#5b8def"/>
+  <rect x="864" y="300" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="886" y="300" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="908" y="300" width="18" height="26" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <text class="lbl-fast" x="960" y="320" text-anchor="end">ready</text>
+
+  <!-- Background poll arrows source → queues -->
+  <line class="arr-prefill" x1="675" y1="133" x2="818" y2="133"/>
+  <line class="arr-prefill" x1="675" y1="193" x2="818" y2="193"/>
+  <line class="arr-prefill" x1="675" y1="253" x2="818" y2="253"/>
+  <line class="arr-prefill" x1="675" y1="313" x2="818" y2="313"/>
+
+  <!-- Drain arrows queues → SPM (off-panel arrow indicator) -->
+  <text class="lbl-fast" x="1010" y="220" text-anchor="middle">→ SPM</text>
+  <text class="lbl-fast" x="1010" y="238" text-anchor="middle">unblocked</text>
+
+  <!-- Legend -->
+  <rect x="20" y="412" width="1040" height="36" rx="4" fill="#fafafa" stroke="#ddd"/>
+  <line class="arr-blocked" x1="35" y1="430" x2="75" y2="430"/>
+  <text class="lbl" x="82" y="434">SPM polls partition (blocked when no batch)</text>
+  <line class="arr-prefill" x1="430" y1="430" x2="470" y2="430"/>
+  <text class="lbl" x="477" y="434">background prefill keeps queue warm</text>
+  <rect x="790" y="420" width="14" height="14" fill="#5b8def"/>
+  <text class="lbl" x="810" y="432">buffered batch (ready to drain)</text>
+</svg>
diff --git a/content/images/sort-pushdown/buffer-exec.svg b/content/images/sort-pushdown/buffer-exec.svg
new file mode 100644
index 00000000..fe8a744d
--- /dev/null
+++ b/content/images/sort-pushdown/buffer-exec.svg
@@ -0,0 +1,89 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 920 380">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .panel { font-size: 14px; font-weight: 700; fill: #1d1d1f; }
+    .panel-buf { font-size: 13px; font-weight: 800; fill: #5b8def; }
+    .panel-spm { font-size: 14px; font-weight: 800; fill: #27ae60; }
+    .lbl { font-size: 12px; fill: #555; }
+    .lbl-poll { font-size: 11px; fill: #c0392b; font-style: italic; }
+    .small { font-size: 11px; fill: #666; font-family: 'Courier New', monospace; }
+    .arr { fill: none; stroke: #555; stroke-width: 1.4; marker-end: url(#arr); }
+    .arr-bg { fill: none; stroke: #c0392b; stroke-width: 1.2; stroke-dasharray: 3,3; marker-end: url(#arr-r); }
+  </style>
+  <defs>
+    <marker id="arr" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+    <marker id="arr-r" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#c0392b"/>
+    </marker>
+  </defs>
+
+  <text class="title" x="460" y="26" text-anchor="middle">BufferExec — per-partition bounded queues</text>
+
+  <!-- ============ LEFT: Upstream box ============ -->
+  <rect x="30" y="60" width="120" height="280" rx="8" fill="#fafafa" stroke="#bbb" stroke-width="1"/>
+  <text class="panel" x="90" y="92" text-anchor="middle">DataSource</text>
+  <text class="lbl" x="90" y="112" text-anchor="middle">(I/O bound)</text>
+
+  <!-- ============ MIDDLE: BufferExec ============ -->
+  <rect x="200" y="60" width="500" height="280" rx="8" fill="#fff" stroke="#5b8def" stroke-width="2"/>
+  <text class="panel-buf" x="450" y="84" text-anchor="middle">BufferExec</text>
+  <text class="small" x="450" y="100" text-anchor="middle">capacity: sort_pushdown_buffer_capacity · default 1 GB</text>
+
+  <!-- Partition 0 -->
+  <rect x="220" y="118" width="460" height="48" rx="6" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="234" y="139">Partition 0</text>
+  <rect x="400" y="129" width="22" height="28" fill="#5b8def"/>
+  <rect x="426" y="129" width="22" height="28" fill="#5b8def"/>
+  <rect x="452" y="129" width="22" height="28" fill="#5b8def"/>
+  <rect x="478" y="129" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="504" y="129" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <text class="small" x="660" y="146" text-anchor="end">bounded queue</text>
+
+  <!-- Partition 1 -->
+  <rect x="220" y="178" width="460" height="48" rx="6" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="234" y="199">Partition 1</text>
+  <rect x="400" y="189" width="22" height="28" fill="#5b8def"/>
+  <rect x="426" y="189" width="22" height="28" fill="#5b8def"/>
+  <rect x="452" y="189" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="478" y="189" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="504" y="189" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+
+  <!-- ... -->
+  <text class="lbl" x="450" y="252" text-anchor="middle">…</text>
+
+  <!-- Partition N -->
+  <rect x="220" y="272" width="460" height="48" rx="6" fill="#f5f5f7" stroke="#bbb"/>
+  <text class="lbl" x="234" y="293">Partition N</text>
+  <rect x="400" y="283" width="22" height="28" fill="#5b8def"/>
+  <rect x="426" y="283" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="452" y="283" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="478" y="283" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+  <rect x="504" y="283" width="22" height="28" fill="none" stroke="#bbb" stroke-dasharray="2,2"/>
+
+  <!-- Background poll arrows from upstream -->
+  <line class="arr-bg" x1="150" y1="142" x2="398" y2="142"/>
+  <line class="arr-bg" x1="150" y1="202" x2="398" y2="202"/>
+  <line class="arr-bg" x1="150" y1="296" x2="398" y2="296"/>
+
+  <!-- Drain arrows to SPM -->
+  <line class="arr" x1="680" y1="142" x2="750" y2="142"/>
+  <line class="arr" x1="680" y1="202" x2="750" y2="202"/>
+  <line class="arr" x1="680" y1="296" x2="750" y2="296"/>
+
+  <!-- ============ RIGHT: SPM box ============ -->
+  <rect x="750" y="60" width="140" height="280" rx="8" fill="#fff" stroke="#27ae60" stroke-width="2"/>
+  <text class="panel-spm" x="820" y="92" text-anchor="middle">SPM</text>
+  <text class="lbl" x="820" y="112" text-anchor="middle">k-way merge</text>
+
+  <!-- Legend -->
+  <rect x="20" y="350" width="880" height="26" rx="4" fill="#fafafa" stroke="#eee"/>
+  <line class="arr-bg" x1="40" y1="363" x2="80" y2="363"/>
+  <text class="lbl-poll" x="86" y="367">background prefill</text>
+  <line class="arr" x1="280" y1="363" x2="320" y2="363"/>
+  <text class="lbl" x="326" y="367">consumer drains on demand (no I/O stall)</text>
+  <rect x="650" y="356" width="14" height="14" fill="#5b8def"/>
+  <text class="lbl" x="670" y="367">buffered batch</text>
+</svg>
diff --git a/content/images/sort-pushdown/desc_walk_file.png b/content/images/sort-pushdown/desc_walk_file.png
new file mode 100644
index 00000000..913aa86c
Binary files /dev/null and b/content/images/sort-pushdown/desc_walk_file.png differ
diff --git a/content/images/sort-pushdown/desc_walk_rg.png b/content/images/sort-pushdown/desc_walk_rg.png
new file mode 100644
index 00000000..7bfff559
Binary files /dev/null and b/content/images/sort-pushdown/desc_walk_rg.png differ
diff --git a/content/images/sort-pushdown/future_page_level.png b/content/images/sort-pushdown/future_page_level.png
new file mode 100644
index 00000000..e41f781b
Binary files /dev/null and b/content/images/sort-pushdown/future_page_level.png differ
diff --git a/content/images/sort-pushdown/phase1-file-reorder.svg b/content/images/sort-pushdown/phase1-file-reorder.svg
new file mode 100644
index 00000000..3ea69648
--- /dev/null
+++ b/content/images/sort-pushdown/phase1-file-reorder.svg
@@ -0,0 +1,88 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 320">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .sub { font-size: 12px; fill: #555; }
+    .filename { font-size: 11px; font-weight: 600; fill: #fff; }
+    .range { font-size: 11px; fill: #333; font-family: 'Courier New', monospace; }
+    .label { font-size: 11px; fill: #555; }
+    .arrow { fill: none; stroke: #555; stroke-width: 1.6; marker-end: url(#arrow); }
+    .verdict-good { font-size: 12px; font-weight: 600; fill: #27ae60; }
+    .verdict-bad { font-size: 12px; font-weight: 600; fill: #c0392b; }
+  </style>
+  <defs>
+    <marker id="arrow" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+  </defs>
+
+  <!-- Title -->
+  <text class="title" x="410" y="22" text-anchor="middle">File rearrangement by min/max statistics</text>
+
+  <!-- BEFORE column -->
+  <text class="sub" x="20" y="55">Before — directory order:</text>
+
+  <!-- file boxes (before) -->
+  <rect x="20" y="65" width="160" height="36" rx="4" fill="#5b8def"/>
+  <text class="filename" x="100" y="80" text-anchor="middle">a.parquet</text>
+  <text class="range" x="100" y="95" text-anchor="middle" fill="#fff">ts ∈ [200, 300]</text>
+
+  <rect x="20" y="111" width="160" height="36" rx="4" fill="#5b8def"/>
+  <text class="filename" x="100" y="126" text-anchor="middle">b.parquet</text>
+  <text class="range" x="100" y="141" text-anchor="middle" fill="#fff">ts ∈ [100, 200]</text>
+
+  <rect x="20" y="157" width="160" height="36" rx="4" fill="#5b8def"/>
+  <text class="filename" x="100" y="172" text-anchor="middle">c.parquet</text>
+  <text class="range" x="100" y="187" text-anchor="middle" fill="#fff">ts ∈ [0, 100]</text>
+
+  <text class="verdict-bad" x="100" y="220" text-anchor="middle">validated_output_ordering() = None</text>
+  <text class="label" x="100" y="238" text-anchor="middle">→ SortExec required</text>
+
+  <!-- arrow -->
+  <line class="arrow" x1="210" y1="130" x2="320" y2="130"/>
+  <text class="label" x="265" y="120" text-anchor="middle">PushdownSort</text>
+  <text class="label" x="265" y="146" text-anchor="middle">sort by min(ts)</text>
+
+  <!-- AFTER column -->
+  <text class="sub" x="350" y="55">After — sorted by stats:</text>
+
+  <rect x="350" y="65" width="160" height="36" rx="4" fill="#27ae60"/>
+  <text class="filename" x="430" y="80" text-anchor="middle">c.parquet</text>
+  <text class="range" x="430" y="95" text-anchor="middle" fill="#fff">ts ∈ [0, 100]</text>
+
+  <rect x="350" y="111" width="160" height="36" rx="4" fill="#27ae60"/>
+  <text class="filename" x="430" y="126" text-anchor="middle">b.parquet</text>
+  <text class="range" x="430" y="141" text-anchor="middle" fill="#fff">ts ∈ [100, 200]</text>
+
+  <rect x="350" y="157" width="160" height="36" rx="4" fill="#27ae60"/>
+  <text class="filename" x="430" y="172" text-anchor="middle">a.parquet</text>
+  <text class="range" x="430" y="187" text-anchor="middle" fill="#fff">ts ∈ [200, 300]</text>
+
+  <text class="verdict-good" x="430" y="220" text-anchor="middle">validated_output_ordering() = Exact</text>
+  <text class="label" x="430" y="238" text-anchor="middle">→ SortExec removed</text>
+
+  <!-- Right column: number line -->
+  <text class="sub" x="640" y="55" text-anchor="middle">Range layout</text>
+  <line x1="560" y1="130" x2="800" y2="130" stroke="#888" stroke-width="1"/>
+  <line x1="560" y1="125" x2="560" y2="135" stroke="#888"/>
+  <line x1="640" y1="125" x2="640" y2="135" stroke="#888"/>
+  <line x1="720" y1="125" x2="720" y2="135" stroke="#888"/>
+  <line x1="800" y1="125" x2="800" y2="135" stroke="#888"/>
+  <text class="range" x="560" y="148" text-anchor="middle">0</text>
+  <text class="range" x="640" y="148" text-anchor="middle">100</text>
+  <text class="range" x="720" y="148" text-anchor="middle">200</text>
+  <text class="range" x="800" y="148" text-anchor="middle">300</text>
+
+  <!-- range bars -->
+  <rect x="560" y="80" width="80" height="14" rx="3" fill="#27ae60"/>
+  <text class="range" x="600" y="91" text-anchor="middle" fill="#fff">c</text>
+  <rect x="640" y="98" width="80" height="14" rx="3" fill="#27ae60"/>
+  <text class="range" x="680" y="109" text-anchor="middle" fill="#fff">b</text>
+  <rect x="720" y="116" width="80" height="14" rx="3" fill="#27ae60"/>
+  <text class="range" x="760" y="127" text-anchor="middle" fill="#fff">a</text>
+
+  <text class="label" x="680" y="170" text-anchor="middle">Non-overlapping → ordering provable</text>
+
+  <!-- Bottom: SQL hint -->
+  <text x="410" y="290" text-anchor="middle" font-size="12" fill="#555" font-family="'Courier New', monospace">SELECT * FROM events ORDER BY ts</text>
+</svg>
diff --git a/content/images/sort-pushdown/phase2-stats-overlap.svg b/content/images/sort-pushdown/phase2-stats-overlap.svg
new file mode 100644
index 00000000..ab03f133
--- /dev/null
+++ b/content/images/sort-pushdown/phase2-stats-overlap.svg
@@ -0,0 +1,92 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1000 480">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 20px; font-weight: 800; fill: #222; }
+    .panel-title { font-size: 18px; font-weight: 800; }
+    .label { font-size: 14px; fill: #555; }
+    .axis-num { font-size: 14px; fill: #333; font-family: 'Courier New', monospace; }
+    .file-label { font-size: 15px; fill: #1d1d1f; font-family: 'Courier New', monospace; font-weight: 700; }
+    .verdict-good { font-size: 18px; font-weight: 800; fill: #27ae60; }
+    .verdict-bad { font-size: 18px; font-weight: 800; fill: #c0392b; }
+    .verdict-sub { font-size: 14px; fill: #555; }
+    .axis { stroke: #888; stroke-width: 1.4; }
+    .tick { stroke: #888; stroke-width: 1.4; }
+    .gap { stroke: #27ae60; stroke-dasharray: 4,4; stroke-width: 1.4; }
+    .footer { font-size: 14px; fill: #555; font-style: italic; }
+  </style>
+
+  <text class="title" x="500" y="32" text-anchor="middle">Using min/max statistics to prove non-overlap</text>
+
+  <!-- ============ LEFT: Non-overlapping (Exact) ============ -->
+  <rect x="30" y="55" width="460" height="370" rx="10" fill="#e8f5e9" stroke="#27ae60" stroke-width="2"/>
+  <text class="panel-title" x="260" y="84" text-anchor="middle" fill="#27ae60">Non-overlapping ranges</text>
+
+  <!-- file_c -->
+  <text class="file-label" x="135" y="112" text-anchor="middle">file_c  [0..100]</text>
+  <rect x="70"  y="120" width="130" height="28" rx="4" fill="#27ae60"/>
+
+  <!-- file_b -->
+  <text class="file-label" x="265" y="172" text-anchor="middle">file_b  [100..200]</text>
+  <rect x="200" y="180" width="130" height="28" rx="4" fill="#27ae60"/>
+
+  <!-- file_a -->
+  <text class="file-label" x="395" y="232" text-anchor="middle">file_a  [200..300]</text>
+  <rect x="330" y="240" width="130" height="28" rx="4" fill="#27ae60"/>
+
+  <!-- axis -->
+  <line class="axis" x1="70" y1="300" x2="460" y2="300"/>
+  <line class="tick" x1="70"  y1="294" x2="70"  y2="306"/>
+  <line class="tick" x1="200" y1="294" x2="200" y2="306"/>
+  <line class="tick" x1="330" y1="294" x2="330" y2="306"/>
+  <line class="tick" x1="460" y1="294" x2="460" y2="306"/>
+  <text class="axis-num" x="70"  y="322" text-anchor="middle">0</text>
+  <text class="axis-num" x="200" y="322" text-anchor="middle">100</text>
+  <text class="axis-num" x="330" y="322" text-anchor="middle">200</text>
+  <text class="axis-num" x="460" y="322" text-anchor="middle">300</text>
+  <text class="label" x="265" y="344" text-anchor="middle">min(ts) / max(ts)</text>
+
+  <!-- gap markers -->
+  <line class="gap" x1="200" y1="112" x2="200" y2="298"/>
+  <line class="gap" x1="330" y1="172" x2="330" y2="298"/>
+
+  <text class="verdict-good" x="260" y="383" text-anchor="middle">Ordering: Exact ✓</text>
+  <text class="verdict-sub"  x="260" y="406" text-anchor="middle">SortExec can be removed</text>
+
+  <!-- ============ RIGHT: Overlapping (Inexact) ============ -->
+  <rect x="510" y="55" width="460" height="370" rx="10" fill="#fde8e8" stroke="#c0392b" stroke-width="2"/>
+  <text class="panel-title" x="740" y="84" text-anchor="middle" fill="#c0392b">Overlapping ranges</text>
+
+  <!-- file_x -->
+  <text class="file-label" x="667" y="112" text-anchor="middle">file_x  [0..180]</text>
+  <rect x="550" y="120" width="234" height="28" rx="4" fill="#c0392b" opacity="0.85"/>
+
+  <!-- file_y -->
+  <text class="file-label" x="771" y="172" text-anchor="middle">file_y  [80..260]</text>
+  <rect x="654" y="180" width="234" height="28" rx="4" fill="#c0392b" opacity="0.85"/>
+
+  <!-- file_z -->
+  <text class="file-label" x="836" y="232" text-anchor="middle">file_z  [140..300]</text>
+  <rect x="732" y="240" width="208" height="28" rx="4" fill="#c0392b" opacity="0.85"/>
+
+  <!-- overlap shading -->
+  <rect x="654" y="120" width="130" height="28" fill="#c0392b" opacity="0.30"/>
+  <rect x="732" y="180" width="156" height="28" fill="#c0392b" opacity="0.30"/>
+
+  <!-- axis -->
+  <line class="axis" x1="550" y1="300" x2="940" y2="300"/>
+  <line class="tick" x1="550" y1="294" x2="550" y2="306"/>
+  <line class="tick" x1="680" y1="294" x2="680" y2="306"/>
+  <line class="tick" x1="810" y1="294" x2="810" y2="306"/>
+  <line class="tick" x1="940" y1="294" x2="940" y2="306"/>
+  <text class="axis-num" x="550" y="322" text-anchor="middle">0</text>
+  <text class="axis-num" x="680" y="322" text-anchor="middle">100</text>
+  <text class="axis-num" x="810" y="322" text-anchor="middle">200</text>
+  <text class="axis-num" x="940" y="322" text-anchor="middle">300</text>
+  <text class="label" x="745" y="344" text-anchor="middle">min(ts) / max(ts)</text>
+
+  <text class="verdict-bad" x="740" y="383" text-anchor="middle">Ordering: Inexact (or stripped)</text>
+  <text class="verdict-sub" x="740" y="406" text-anchor="middle">SortExec stays</text>
+
+  <!-- footer -->
+  <text class="footer" x="500" y="455" text-anchor="middle">PushdownSort sorts files by min, checks adjacency, upgrades to Exact only when ranges don't overlap.</text>
+</svg>
diff --git a/content/images/sort-pushdown/plan-diff.svg b/content/images/sort-pushdown/plan-diff.svg
new file mode 100644
index 00000000..8fdd8bc4
--- /dev/null
+++ b/content/images/sort-pushdown/plan-diff.svg
@@ -0,0 +1,53 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 360">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .panel { font-size: 13px; font-weight: 700; }
+    .node { font-size: 12px; font-weight: 600; fill: #fff; }
+    .detail { font-size: 10px; font-family: 'Courier New', monospace; fill: #fff; }
+    .annot { font-size: 11px; fill: #c0392b; font-style: italic; }
+    .annot-g { font-size: 11px; fill: #27ae60; font-style: italic; font-weight: 600; }
+    .arrow { fill: none; stroke: #555; stroke-width: 1.4; marker-end: url(#arr2); }
+  </style>
+  <defs>
+    <marker id="arr2" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+  </defs>
+
+  <text class="title" x="410" y="22" text-anchor="middle">EXPLAIN before / after sort pushdown (single-partition)</text>
+
+  <!-- BEFORE -->
+  <rect x="20" y="40" width="370" height="300" rx="8" fill="#fff" stroke="#c0392b" stroke-width="1.5"/>
+  <text class="panel" x="205" y="62" text-anchor="middle" fill="#c0392b">Before — SortExec on top of scan</text>
+
+  <rect x="65" y="100" width="280" height="58" rx="4" fill="#c0392b"/>
+  <text class="node" x="205" y="122" text-anchor="middle">SortExec: TopK(fetch=3)</text>
+  <text class="detail" x="205" y="140" text-anchor="middle">expr=[ts@0 ASC NULLS LAST]</text>
+  <text class="detail" x="205" y="154" text-anchor="middle">preserve_partitioning=[false]</text>
+
+  <line class="arrow" x1="205" y1="164" x2="205" y2="184"/>
+
+  <rect x="65" y="186" width="280" height="58" rx="4" fill="#5b8def"/>
+  <text class="node" x="205" y="208" text-anchor="middle">DataSourceExec</text>
+  <text class="detail" x="205" y="226" text-anchor="middle">file_groups={a, b, c}</text>
+  <text class="detail" x="205" y="240" text-anchor="middle">file_type=parquet</text>
+
+  <text class="annot" x="205" y="284" text-anchor="middle">blocking full sort across the scan</text>
+  <text class="annot" x="205" y="302" text-anchor="middle">LIMIT applied after sort</text>
+
+  <!-- AFTER -->
+  <rect x="430" y="40" width="370" height="300" rx="8" fill="#fff" stroke="#27ae60" stroke-width="1.5"/>
+  <text class="panel" x="615" y="62" text-anchor="middle" fill="#27ae60">After — SortExec eliminated</text>
+
+  <rect x="475" y="120" width="280" height="78" rx="4" fill="#27ae60"/>
+  <text class="node" x="615" y="142" text-anchor="middle">DataSourceExec</text>
+  <text class="detail" x="615" y="160" text-anchor="middle">file_groups={c, b, a}</text>
+  <text class="detail" x="615" y="174" text-anchor="middle">limit=3</text>
+  <text class="detail" x="615" y="188" text-anchor="middle">output_ordering=[ts ASC NULLS LAST]</text>
+
+  <text class="annot-g" x="615" y="232" text-anchor="middle">files reordered by min/max stats</text>
+  <text class="annot-g" x="615" y="250" text-anchor="middle">LIMIT becomes a static fetch on the source</text>
+  <text class="annot-g" x="615" y="268" text-anchor="middle">reader stops the moment 3 rows are emitted</text>
+  <text class="annot-g" x="615" y="286" text-anchor="middle">no blocking sort, no full scan</text>
+</svg>
diff --git a/content/images/sort-pushdown/pr21956-decision.svg b/content/images/sort-pushdown/pr21956-decision.svg
new file mode 100644
index 00000000..f03aa853
--- /dev/null
+++ b/content/images/sort-pushdown/pr21956-decision.svg
@@ -0,0 +1,87 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1080 500">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 20px; font-weight: 800; fill: #222; }
+    .entry-label { font-size: 16px; font-weight: 800; fill: #fff; }
+    .entry-sub { font-size: 13px; fill: #d8e0e8; font-family: 'Courier New', monospace; }
+    .check-label { font-size: 14px; fill: #1d1d1f; font-weight: 700; }
+    .check-sub { font-size: 13px; fill: #555; font-family: 'Courier New', monospace; }
+    .term-label { font-size: 17px; font-weight: 800; fill: #fff; }
+    .term-sub { font-size: 13px; fill: #fff; font-family: 'Courier New', monospace; }
+    .branch-yes { font-size: 14px; font-weight: 800; fill: #27ae60; }
+    .branch-no { font-size: 14px; font-weight: 800; fill: #c0392b; }
+    .effect { font-size: 13px; fill: #555; }
+    .arr { fill: none; stroke: #555; stroke-width: 1.8; marker-end: url(#arr); }
+    .arr-yes { fill: none; stroke: #27ae60; stroke-width: 2; marker-end: url(#arr-g); }
+    .arr-no { fill: none; stroke: #888; stroke-width: 1.8; marker-end: url(#arr); }
+  </style>
+  <defs>
+    <marker id="arr" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+    <marker id="arr-g" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#27ae60"/>
+    </marker>
+  </defs>
+
+  <text class="title" x="540" y="32" text-anchor="middle">try_pushdown_sort — Exact / Inexact / Unsupported</text>
+
+  <!-- ============ Entry pill ============ -->
+  <rect x="40" y="60" width="280" height="60" rx="30" fill="#34495e"/>
+  <text class="entry-label" x="180" y="86" text-anchor="middle">PushdownSort rule</text>
+  <text class="entry-sub" x="180" y="106" text-anchor="middle">source.try_pushdown_sort(req)</text>
+
+  <!-- arrow entry → check1 -->
+  <line class="arr" x1="320" y1="90" x2="370" y2="90"/>
+
+  <!-- ============ Check 1 ============ -->
+  <rect x="370" y="50" width="320" height="80" rx="10" fill="#fff" stroke="#5b8def" stroke-width="2"/>
+  <text class="check-label" x="530" y="80" text-anchor="middle">Natural ordering satisfies request?</text>
+  <text class="check-sub" x="530" y="104" text-anchor="middle">eq.ordering_satisfy(req)</text>
+
+  <!-- Branch yes → Exact (right) -->
+  <line class="arr-yes" x1="690" y1="90" x2="800" y2="90"/>
+  <text class="branch-yes" x="745" y="78" text-anchor="middle">yes</text>
+
+  <rect x="800" y="58" width="240" height="64" rx="10" fill="#27ae60"/>
+  <text class="term-label" x="920" y="84" text-anchor="middle">Exact</text>
+  <text class="term-sub" x="920" y="106" text-anchor="middle">drop SortExec</text>
+
+  <!-- Branch no (down to check 2) -->
+  <line class="arr-no" x1="530" y1="130" x2="530" y2="200"/>
+  <text class="branch-no" x="544" y="170" text-anchor="start">no</text>
+
+  <!-- ============ Check 2 ============ -->
+  <rect x="370" y="200" width="320" height="100" rx="10" fill="#fff" stroke="#5b8def" stroke-width="2"/>
+  <text class="check-label" x="530" y="232" text-anchor="middle">Pushdown still applies?</text>
+  <text class="check-sub" x="530" y="256" text-anchor="middle">column_in_file_schema</text>
+  <text class="check-sub" x="530" y="276" text-anchor="middle">OR reversed_satisfies(req)</text>
+
+  <!-- Branch yes → Inexact (right) -->
+  <line class="arr-yes" x1="690" y1="250" x2="800" y2="250"/>
+  <text class="branch-yes" x="745" y="238" text-anchor="middle">yes</text>
+
+  <rect x="800" y="218" width="240" height="64" rx="10" fill="#e67e22"/>
+  <text class="term-label" x="920" y="244" text-anchor="middle">Inexact</text>
+  <text class="term-sub" x="920" y="266" text-anchor="middle">set runtime-reorder flags</text>
+
+  <!-- Branch no (down to Unsupported) -->
+  <line class="arr-no" x1="530" y1="300" x2="530" y2="370"/>
+  <text class="branch-no" x="544" y="340" text-anchor="start">no</text>
+
+  <!-- ============ Unsupported terminal ============ -->
+  <rect x="370" y="370" width="320" height="64" rx="10" fill="#95a5a6"/>
+  <text class="term-label" x="530" y="396" text-anchor="middle">Unsupported</text>
+  <text class="term-sub" x="530" y="418" text-anchor="middle">SortExec stays · full external sort</text>
+
+  <!-- ============ Effects legend (right side, bottom) ============ -->
+  <rect x="800" y="320" width="240" height="114" rx="8" fill="#fafafa" stroke="#e3e3e3"/>
+  <text class="check-label" x="820" y="345">Outcomes</text>
+  <circle cx="826" cy="368" r="6" fill="#27ae60"/>
+  <text class="effect" x="840" y="372">Exact · static limit on source</text>
+  <circle cx="826" cy="392" r="6" fill="#e67e22"/>
+  <text class="effect" x="840" y="396">Inexact · TopK + RG pruner</text>
+  <circle cx="826" cy="416" r="6" fill="#95a5a6"/>
+  <text class="effect" x="840" y="420">Unsupported · no benefit</text>
+
+</svg>
diff --git a/content/images/sort-pushdown/pr21956-runtime-pipeline.svg b/content/images/sort-pushdown/pr21956-runtime-pipeline.svg
new file mode 100644
index 00000000..6490ea96
--- /dev/null
+++ b/content/images/sort-pushdown/pr21956-runtime-pipeline.svg
@@ -0,0 +1,82 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1180 620">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 22px; font-weight: 800; fill: #222; }
+    .source-title { font-size: 17px; font-weight: 800; fill: #34495e; }
+    .source-flag-name { font-size: 14px; fill: #1d1d1f; font-family: 'Courier New', monospace; font-weight: 700; }
+    .source-flag-val { font-size: 13px; fill: #555; font-family: 'Courier New', monospace; }
+    .step-circle { font-size: 22px; font-weight: 800; fill: #fff; }
+    .step-title { font-size: 19px; font-weight: 800; fill: #fff; }
+    .step-scope { font-size: 13px; font-weight: 700; fill: #fff; opacity: 0.85; }
+    .step-body { font-size: 15px; fill: #fff; }
+    .step-code { font-size: 13px; font-family: 'Courier New', monospace; fill: #fff; opacity: 0.85; }
+    .branch-label { font-size: 14px; fill: #c0392b; font-style: italic; font-weight: 700; }
+    .footer-title { font-size: 17px; font-weight: 800; fill: #fff; }
+    .footer-sub { font-size: 14px; fill: #fff; opacity: 0.9; font-style: italic; }
+    .arr { fill: none; stroke: #555; stroke-width: 2; marker-end: url(#arr); }
+  </style>
+  <defs>
+    <marker id="arr" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#555"/>
+    </marker>
+  </defs>
+
+  <text class="title" x="590" y="36" text-anchor="middle">Two flags drive a three-step runtime pipeline</text>
+
+  <!-- ============ Source state ============ -->
+  <rect x="90" y="60" width="1000" height="100" rx="10" fill="#f6f7fa" stroke="#34495e" stroke-width="2"/>
+  <text class="source-title" x="590" y="90" text-anchor="middle">ParquetSource carries the inexact-pushdown decision</text>
+
+  <text class="source-flag-name" x="160" y="124">sort_order_for_reorder</text>
+  <text class="source-flag-val" x="370" y="124">= Some([req_col ASC | DESC])</text>
+
+  <text class="source-flag-name" x="160" y="148">reverse_row_groups</text>
+  <text class="source-flag-val" x="370" y="148">= bool</text>
+
+  <text class="source-flag-val" x="820" y="124" text-anchor="end" font-style="italic" font-family="Segoe UI">set by</text>
+  <text class="source-flag-name" x="830" y="124">try_pushdown_sort</text>
+  <text class="source-flag-val" x="1020" y="124" font-style="italic" font-family="Segoe UI">read by opener</text>
+
+  <!-- arrow to step 1 -->
+  <line class="arr" x1="590" y1="164" x2="590" y2="190"/>
+
+  <!-- ============ Step 1: File-level reorder ============ -->
+  <rect x="90" y="195" width="1000" height="85" rx="10" fill="#5b8def"/>
+  <circle cx="135" cy="237" r="22" fill="#fff"/>
+  <text class="step-circle" x="135" y="245" text-anchor="middle" fill="#5b8def">1</text>
+  <text class="step-title" x="180" y="220">File-level reorder</text>
+  <text class="step-scope" x="180" y="240">across the whole scan · shared morsel queue</text>
+  <text class="step-body" x="180" y="263">Reorder files by <tspan font-family="Courier New" font-weight="700">min(col)</tspan> so the most-promising file opens first</text>
+
+  <!-- arrow with label -->
+  <line class="arr" x1="590" y1="284" x2="590" y2="305"/>
+  <text class="branch-label" x="605" y="300" text-anchor="start">for each opened file</text>
+
+  <!-- ============ Step 2: RG reorder ============ -->
+  <rect x="90" y="310" width="1000" height="85" rx="10" fill="#27ae60"/>
+  <circle cx="135" cy="352" r="22" fill="#fff"/>
+  <text class="step-circle" x="135" y="360" text-anchor="middle" fill="#27ae60">2</text>
+  <text class="step-title" x="180" y="335">Row-group-level reorder</text>
+  <text class="step-scope" x="180" y="355">per opened file</text>
+  <text class="step-body" x="180" y="378">Sort RGs ASC by <tspan font-family="Courier New" font-weight="700">min(col)</tspan> from the parquet column statistics</text>
+
+  <!-- arrow with label -->
+  <line class="arr" x1="590" y1="399" x2="590" y2="420"/>
+  <text class="branch-label" x="605" y="415" text-anchor="start">if reverse_row_groups</text>
+
+  <!-- ============ Step 3: Reverse iteration ============ -->
+  <rect x="90" y="425" width="1000" height="85" rx="10" fill="#e67e22"/>
+  <circle cx="135" cy="467" r="22" fill="#fff"/>
+  <text class="step-circle" x="135" y="475" text-anchor="middle" fill="#e67e22">3</text>
+  <text class="step-title" x="180" y="450">Reverse iteration</text>
+  <text class="step-scope" x="180" y="470">DESC requests only</text>
+  <text class="step-body" x="180" y="493">Iterate RG list in reverse → <tspan font-family="Courier New" font-weight="700">"RGs DESC × rows ASC"</tspan></text>
+
+  <!-- arrow to decoder -->
+  <line class="arr" x1="590" y1="514" x2="590" y2="535"/>
+
+  <!-- ============ Final: decoder ============ -->
+  <rect x="90" y="540" width="1000" height="70" rx="10" fill="#34495e"/>
+  <text class="footer-title" x="590" y="570" text-anchor="middle">Decoder reads row groups in this order</text>
+  <text class="footer-sub" x="590" y="594" text-anchor="middle">approximate ordering — SortExec / TopK above the source still enforces final ordering</text>
+</svg>
diff --git a/content/images/sort-pushdown/pruner_loop.png b/content/images/sort-pushdown/pruner_loop.png
new file mode 100644
index 00000000..c2e273b0
Binary files /dev/null and b/content/images/sort-pushdown/pruner_loop.png differ
diff --git a/content/images/sort-pushdown/pruning_stack.png b/content/images/sort-pushdown/pruning_stack.png
new file mode 100644
index 00000000..31f85b53
Binary files /dev/null and b/content/images/sort-pushdown/pruning_stack.png differ
diff --git a/content/images/sort-pushdown/reverse-scan.svg b/content/images/sort-pushdown/reverse-scan.svg
new file mode 100644
index 00000000..443a0a1c
--- /dev/null
+++ b/content/images/sort-pushdown/reverse-scan.svg
@@ -0,0 +1,100 @@
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 380">
+  <style>
+    text { font-family: 'Segoe UI', Arial, sans-serif; }
+    .title { font-size: 14px; font-weight: 700; fill: #222; }
+    .panel-title { font-size: 13px; font-weight: 700; }
+    .label { font-size: 11px; fill: #555; }
+    .small { font-size: 10px; fill: #444; }
+    .badge-good { font-size: 12px; font-weight: 700; fill: #27ae60; }
+    .badge-bad { font-size: 12px; font-weight: 700; fill: #c0392b; }
+    .arrow { fill: none; stroke: #c0392b; stroke-width: 1.8; marker-end: url(#arr3); }
+    .arrow-green { fill: none; stroke: #27ae60; stroke-width: 1.8; marker-end: url(#arr3g); }
+  </style>
+  <defs>
+    <marker id="arr3" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#c0392b"/>
+    </marker>
+    <marker id="arr3g" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto">
+      <path d="M0,0 L10,5 L0,10 Z" fill="#27ae60"/>
+    </marker>
+  </defs>
+
+  <text class="title" x="410" y="22" text-anchor="middle">ORDER BY ts DESC LIMIT 10 — row-group reverse vs page reverse</text>
+
+  <!-- row group reverse panel -->
+  <rect x="20" y="40" width="380" height="320" rx="8" fill="#fff8e1" stroke="#e67e22" stroke-width="1.5"/>
+  <text class="panel-title" x="210" y="62" text-anchor="middle" fill="#e67e22">Row-group reverse (today, merged)</text>
+
+  <!-- row group block with pages -->
+  <rect x="40" y="90" width="340" height="120" rx="6" fill="#fff" stroke="#bbb"/>
+  <text class="small" x="50" y="106">RowGroup (last, ~128 MB)</text>
+
+  <!-- pages inside the row group -->
+  <g>
+    <rect x="50"  y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="92"  y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="134" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="176" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="218" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="260" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="302" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="344" y="115" width="36" height="80" fill="#bdc3c7"/>
+    <text class="small" x="69" y="160" text-anchor="middle">P0</text>
+    <text class="small" x="111" y="160" text-anchor="middle">P1</text>
+    <text class="small" x="153" y="160" text-anchor="middle">P2</text>
+    <text class="small" x="195" y="160" text-anchor="middle">P3</text>
+    <text class="small" x="237" y="160" text-anchor="middle">P4</text>
+    <text class="small" x="279" y="160" text-anchor="middle">P5</text>
+    <text class="small" x="321" y="160" text-anchor="middle">P6</text>
+    <text class="small" x="362" y="160" text-anchor="middle">P7</text>
+  </g>
+
+  <text class="label" x="210" y="235" text-anchor="middle">Decode the entire row group, reverse in memory, take 10.</text>
+  <text class="badge-bad" x="210" y="265" text-anchor="middle">Peak buffer: ~128 MB</text>
+  <text class="badge-bad" x="210" y="285" text-anchor="middle">Pages decoded: 8</text>
+  <text class="badge-bad" x="210" y="305" text-anchor="middle">Time-to-first-N: ~29 µs</text>
+
+  <!-- arrows: all pages read -->
+  <line class="arrow" x1="362" y1="80" x2="362" y2="115"/>
+  <line class="arrow" x1="69"  y1="80" x2="69"  y2="115"/>
+  <line class="arrow" x1="111" y1="80" x2="111" y2="115"/>
+  <line class="arrow" x1="153" y1="80" x2="153" y2="115"/>
+  <line class="arrow" x1="195" y1="80" x2="195" y2="115"/>
+  <line class="arrow" x1="237" y1="80" x2="237" y2="115"/>
+  <line class="arrow" x1="279" y1="80" x2="279" y2="115"/>
+  <line class="arrow" x1="321" y1="80" x2="321" y2="115"/>
+
+  <!-- PAGE REVERSE panel -->
+  <rect x="420" y="40" width="380" height="320" rx="8" fill="#e8f5e9" stroke="#27ae60" stroke-width="1.5"/>
+  <text class="panel-title" x="610" y="62" text-anchor="middle" fill="#27ae60">Page reverse (upstream POC, arrow-rs #9937)</text>
+
+  <rect x="440" y="90" width="340" height="120" rx="6" fill="#fff" stroke="#bbb"/>
+  <text class="small" x="450" y="106">RowGroup (last)</text>
+
+  <g>
+    <rect x="450" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="492" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="534" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="576" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="618" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="660" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="702" y="115" width="38" height="80" fill="#bdc3c7"/>
+    <rect x="744" y="115" width="36" height="80" fill="#27ae60"/>
+    <text class="small" x="469" y="160" text-anchor="middle">P0</text>
+    <text class="small" x="511" y="160" text-anchor="middle">P1</text>
+    <text class="small" x="553" y="160" text-anchor="middle">P2</text>
+    <text class="small" x="595" y="160" text-anchor="middle">P3</text>
+    <text class="small" x="637" y="160" text-anchor="middle">P4</text>
+    <text class="small" x="679" y="160" text-anchor="middle">P5</text>
+    <text class="small" x="721" y="160" text-anchor="middle">P6</text>
+    <text class="small" x="762" y="160" text-anchor="middle" fill="#fff">P7</text>
+  </g>
+
+  <!-- only one green arrow on last page -->
+  <line class="arrow-green" x1="762" y1="80" x2="762" y2="115"/>
+
+  <text class="label" x="610" y="235" text-anchor="middle">Seek to last page only via OffsetIndex, decode, reverse, return.</text>
+  <text class="badge-good" x="610" y="265" text-anchor="middle">Peak buffer: ~1 MB</text>
+  <text class="badge-good" x="610" y="285" text-anchor="middle">Pages decoded: 1</text>
+  <text class="badge-good" x="610" y="305" text-anchor="middle">Time-to-first-N: ~565 ns  (≈ 50× faster)</text>
+</svg>
diff --git a/content/images/sort-pushdown/rg_cascade.png b/content/images/sort-pushdown/rg_cascade.png
new file mode 100644
index 00000000..db3dd011
Binary files /dev/null and b/content/images/sort-pushdown/rg_cascade.png differ
diff --git a/content/images/sort-pushdown/topk_tpch_bench.png b/content/images/sort-pushdown/topk_tpch_bench.png
new file mode 100644
index 00000000..e3914fd1
Binary files /dev/null and b/content/images/sort-pushdown/topk_tpch_bench.png differ
diff --git a/content/images/sort-pushdown/transition_anatomy.png b/content/images/sort-pushdown/transition_anatomy.png
new file mode 100644
index 00000000..065edaba
Binary files /dev/null and b/content/images/sort-pushdown/transition_anatomy.png differ