sql/colexec: add multi-level spill join with robust file lifecycle by aunjgr · Pull Request #23915 · matrixorigin/matrixone

aunjgr · 2026-03-20T08:17:01Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Implements recursive (multi-level) spill for hash join when a single
spill pass is insufficient, along with comprehensive fixes to the spill
file lifecycle to prevent orphaned files under all cancellation paths.

Multi-level spill (hashjoin/spill.go)

rebuildHashmapForBucket: after reading a build bucket, if memory
still exceeds the threshold and depth < spillMaxPass, re-spills the
bucket to the next depth instead of OOM-ing.
reSpillBucket: scatters both the build and probe sides of a bucket
into sub-buckets and enqueues them for the next pass.
Sub-bucket naming follows join_<uuid>_<i0>_<i1>_..._<iN>_build/probe
so each level's ancestry is encoded in the filename.
Seed-based XXHash (computeXXHash) uses a per-depth seed to avoid
degenerate distributions when the same keys re-spill at deeper levels.
shouldReSpill checks live memory against the threshold to decide
whether another spill pass is needed.

Spill file lifecycle hardening

context.Background() for all cleanup deletions

cleanupSpillFiles in both hashbuild and hashjoin previously used
proc.Ctx, which is already cancelled by the time cleanup runs on
abnormal client exit. Changed to context.Background().

hashbuild only cleans build files when JoinMap was never sent

Reset() in hashbuild/types.go now calls cleanupSpillFiles only when
!mapSucceed. When mapSucceed=true hashjoin owns the files.

spillQueue pre-populated before probe loop (Gap 1)

build() in hashjoin/join.go now pre-populates spillQueue with build
file names before starting the probe loop. Previously the queue was
populated after the loop, so a mid-loop cancellation left build files
untracked.

defer + ownsBuildFile in rebuildHashmapForBucket (Gap 2)

The build file for a bucket is removed from spillQueue (popped) before
processing begins. A deferred cleanup with an ownsBuildFile flag and
context.Background() ensures the named build file is always deleted,
even on early return or cancellation.

Build file cleanup in reSpillBucket defer (Gap 3)

Moved inline RemoveFile(proc.Ctx, ...) into the existing deferred
cleanup with context.Background().

JoinMap.spillCleanup for cancel-before-receive (Gap 4)

Added spillCleanup func() field to JoinMap (message/joinMapMsg.go).
hashbuild sets it via SetSpillCleanup() with a clone of the bucket
list. FreeMemory() — called via MessageBoard.Reset() → Destroy()
when a pipeline tears down — invokes the cleanup, deleting build files
even when hashjoin cancels before calling ReceiveJoinMap.

Early stop for empty buckets (hashjoin/spill.go)

Skip a bucket entirely when the build side is empty and the join type
is not left outer / left single / left anti (which require probe rows
to pass through regardless).
Skip a bucket entirely when the probe side is empty and the join type
is not right outer / right single / right anti.

IO optimizations

CreateAndRemoveFile (unlinking the directory entry immediately on
open) is used for all probe bucket files and re-spill build files so
the OS reclaims them automatically when the fd is closed, regardless
of whether explicit cleanup runs.
Spill expression executors (spillExprExecs) are initialized once per
build phase and reused across all batches, avoiding repeated
re-evaluation overhead.
acquireSpillBuffers reuses pre-allocated batch buffers for spill.

Naming scheme

hashbuild root build files: join_<uuid>_<i>_build
hashjoin root probe files: join_<uuid>_<i>_probe
Sub-bucket files at depth N: join_<uuid>_<i0>_..._<iN>_build/probe
makeSpillBucketWriters(uid, suffix) generates the full set of bucket
writers for a given parent base name and build/probe suffix.

Metrics and logging

Fixed SpillSize / SpillRows metrics to account for re-spill passes.
Added logutil.Infof lines in hashjoin/spill.go for bucket rebuild and
re-spill events, reporting bucket name and 1-based depth.

Dead code removal

Deleted unused ClearHashmap() method from hashbuild/hashmap.go.
Removed unused vecs [][]*vector.Vector and delVecs fields from
HashmapBuilder; replaced with curVecs []*vector.Vector.

Tests

Added hashjoin/spill_integration_test.go with end-to-end spill and
multi-level re-spill scenarios.
Updated hashjoin/spill_test.go and hashbuild/spill_test.go to
match the new APIs and naming scheme.

When hash join spills build data to disk during memory pressure, the rebuilt hashmap may itself need to spill. This change adds multi-level spill support by replacing the simple bucket index with a spillQueue that supports FIFO processing with prepend for re-spilled sub-buckets. - Add spillQueue (slice with front pop/prepend) replacing spilledBuildBuckets - Add spillMaxPass constant (3) to limit re-spill recursion depth - Refactor getSpilledInputBatch to use spillQueue and support re-spill - Add spill_integration_test.go for rebuild and re-spill flow tests - Minor cleanup: remove unused logutil import, fix probe file cleanup

aunjgr requested a review from ouyuanning as a code owner March 20, 2026 08:17

aunjgr temporarily deployed to ci March 20, 2026 08:17 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 20, 2026 08:17 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 20, 2026 08:17 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 20, 2026 08:17 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 20, 2026 08:17 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 20, 2026 08:17 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 20, 2026 08:17 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 20, 2026 08:17 — with GitHub Actions Failure

matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Mar 20, 2026

mergify bot added the kind/feature label Mar 20, 2026

aunjgr added 7 commits March 20, 2026 19:33

fix SpillSize/SpillRows metrics

f8fd56d

Merge branch 'main' into multi_spill

acc388e

optimize IO

de89c96

Merge branch 'main' into multi_spill

c9e9c16

leverage CreateAndRemove

5fe4e46

ensure cleanup

551f886

Merge branch 'main' into multi_spill

0b217bc

aunjgr had a problem deploying to ci March 23, 2026 14:57 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 23, 2026 14:57 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 23, 2026 14:57 — with GitHub Actions Error

matrix-meow added size/XXL Denotes a PR that changes 2000+ lines and removed size/XL Denotes a PR that changes [1000, 1999] lines labels Mar 23, 2026

fix bvt

1edd775

aunjgr requested a review from heni02 as a code owner March 23, 2026 15:07

aunjgr temporarily deployed to ci March 23, 2026 15:08 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 23, 2026 15:08 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 23, 2026 15:08 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 23, 2026 15:08 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 23, 2026 15:08 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 23, 2026 15:08 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 23, 2026 15:08 — with GitHub Actions Inactive

aunjgr changed the title ~~feat(hashjoin): add multi-level spill support for hash join~~ sql/colexec: add multi-level spill join with robust file lifecycle Mar 23, 2026

aunjgr added 2 commits March 23, 2026 23:29

fix sca

e65ec8b

replace Delete with RemoveFile to avoid frequent filesystem sync

010acdf

aunjgr temporarily deployed to ci March 23, 2026 16:09 — with GitHub Actions Inactive

aunjgr deployed to ci March 23, 2026 16:09 — with GitHub Actions Active

aunjgr temporarily deployed to ci March 23, 2026 16:09 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/colexec: add multi-level spill join with robust file lifecycle#23915

sql/colexec: add multi-level spill join with robust file lifecycle#23915
aunjgr wants to merge 11 commits intomatrixorigin:mainfrom
aunjgr:multi_spill

aunjgr commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aunjgr commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Multi-level spill (hashjoin/spill.go)

Spill file lifecycle hardening

context.Background() for all cleanup deletions

hashbuild only cleans build files when JoinMap was never sent

spillQueue pre-populated before probe loop (Gap 1)

defer + ownsBuildFile in rebuildHashmapForBucket (Gap 2)

Build file cleanup in reSpillBucket defer (Gap 3)

JoinMap.spillCleanup for cancel-before-receive (Gap 4)

Early stop for empty buckets (hashjoin/spill.go)

IO optimizations

Naming scheme

Metrics and logging

Dead code removal

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aunjgr commented Mar 20, 2026 •

edited

Loading