sql/colexec: add multi-level spill join with robust file lifecycle#23915
Open
aunjgr wants to merge 11 commits intomatrixorigin:mainfrom
Open
sql/colexec: add multi-level spill join with robust file lifecycle#23915aunjgr wants to merge 11 commits intomatrixorigin:mainfrom
aunjgr wants to merge 11 commits intomatrixorigin:mainfrom
Conversation
When hash join spills build data to disk during memory pressure, the rebuilt hashmap may itself need to spill. This change adds multi-level spill support by replacing the simple bucket index with a spillQueue that supports FIFO processing with prepend for re-spilled sub-buckets. - Add spillQueue (slice with front pop/prepend) replacing spilledBuildBuckets - Add spillMaxPass constant (3) to limit re-spill recursion depth - Refactor getSpilledInputBatch to use spillQueue and support re-spill - Add spill_integration_test.go for rebuild and re-spill flow tests - Minor cleanup: remove unused logutil import, fix probe file cleanup
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
Which issue(s) this PR fixes:
issue #3433 #23353
What this PR does / why we need it:
Implements recursive (multi-level) spill for hash join when a single
spill pass is insufficient, along with comprehensive fixes to the spill
file lifecycle to prevent orphaned files under all cancellation paths.
Multi-level spill (hashjoin/spill.go)
rebuildHashmapForBucket: after reading a build bucket, if memorystill exceeds the threshold and depth <
spillMaxPass, re-spills thebucket to the next depth instead of OOM-ing.
reSpillBucket: scatters both the build and probe sides of a bucketinto sub-buckets and enqueues them for the next pass.
join_<uuid>_<i0>_<i1>_..._<iN>_build/probeso each level's ancestry is encoded in the filename.
computeXXHash) uses a per-depth seed to avoiddegenerate distributions when the same keys re-spill at deeper levels.
shouldReSpillchecks live memory against the threshold to decidewhether another spill pass is needed.
Spill file lifecycle hardening
context.Background() for all cleanup deletions
cleanupSpillFilesin both hashbuild and hashjoin previously usedproc.Ctx, which is already cancelled by the time cleanup runs onabnormal client exit. Changed to
context.Background().hashbuild only cleans build files when JoinMap was never sent
Reset()in hashbuild/types.go now callscleanupSpillFilesonly when!mapSucceed. WhenmapSucceed=truehashjoin owns the files.spillQueue pre-populated before probe loop (Gap 1)
build()in hashjoin/join.go now pre-populatesspillQueuewith buildfile names before starting the probe loop. Previously the queue was
populated after the loop, so a mid-loop cancellation left build files
untracked.
defer + ownsBuildFile in rebuildHashmapForBucket (Gap 2)
The build file for a bucket is removed from
spillQueue(popped) beforeprocessing begins. A deferred cleanup with an
ownsBuildFileflag andcontext.Background()ensures the named build file is always deleted,even on early return or cancellation.
Build file cleanup in reSpillBucket defer (Gap 3)
Moved inline
RemoveFile(proc.Ctx, ...)into the existing deferredcleanup with
context.Background().JoinMap.spillCleanup for cancel-before-receive (Gap 4)
Added
spillCleanup func()field toJoinMap(message/joinMapMsg.go).hashbuild sets it via
SetSpillCleanup()with a clone of the bucketlist.
FreeMemory()— called viaMessageBoard.Reset()→Destroy()when a pipeline tears down — invokes the cleanup, deleting build files
even when hashjoin cancels before calling
ReceiveJoinMap.Early stop for empty buckets (hashjoin/spill.go)
is not left outer / left single / left anti (which require probe rows
to pass through regardless).
is not right outer / right single / right anti.
IO optimizations
CreateAndRemoveFile(unlinking the directory entry immediately onopen) is used for all probe bucket files and re-spill build files so
the OS reclaims them automatically when the fd is closed, regardless
of whether explicit cleanup runs.
spillExprExecs) are initialized once perbuild phase and reused across all batches, avoiding repeated
re-evaluation overhead.
acquireSpillBuffersreuses pre-allocated batch buffers for spill.Naming scheme
join_<uuid>_<i>_buildjoin_<uuid>_<i>_probejoin_<uuid>_<i0>_..._<iN>_build/probemakeSpillBucketWriters(uid, suffix)generates the full set of bucketwriters for a given parent base name and build/probe suffix.
Metrics and logging
SpillSize/SpillRowsmetrics to account for re-spill passes.logutil.Infoflines in hashjoin/spill.go for bucket rebuild andre-spill events, reporting bucket name and 1-based depth.
Dead code removal
ClearHashmap()method from hashbuild/hashmap.go.vecs [][]*vector.VectoranddelVecsfields fromHashmapBuilder; replaced withcurVecs []*vector.Vector.Tests
hashjoin/spill_integration_test.gowith end-to-end spill andmulti-level re-spill scenarios.
hashjoin/spill_test.goandhashbuild/spill_test.gotomatch the new APIs and naming scheme.