interpolate: build batch spline 3D coeffs in place to halve construction peak memory#341
Merged
Merged
Conversation
…ion peak memory construct_batch_splines_3d_legacy held temp_coeff (full coefficient-array size, ~24 GB for the reactor chartmap rho=501,theta=55,zeta=300, order 5) alongside the equal-size spl%coeff for the whole build, doubling peak memory and causing OOM during chartmap field loading. Build each quantity directly into spl%coeff(iq,...); the spl_reg/spl_per sequence is unchanged so coefficients are bit-identical.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
construct_batch_splines_3d_legacy(the non-OpenACC CPU path) heldtemp_coeff,a full-size copy of the coefficient array, alongside
spl%coefffor the wholebuild.
temp_coeffhas shape(0:N1,0:N2,0:N3,n1,n2,n3), equal tospl%coeffper quantity, so construction peaked at twice the final spline size.
For the reactor chartmap (
rho=501, theta=55, zeta=300, order 5) one quantityis ~24 GB, so the build held ~48 GB and pushed the chartmap field-load phase to
a 64.5 GB resident peak, which the OOM-killer terminated (
simple.x,anon-rss 7.4 GBmid-climb,total-vm 72 GB).This builds each quantity directly into
spl%coeff(iq,...). Thespl_reg/spl_percall sequence is unchanged, so the coefficients are bit-identical;only the destination array changes.
temp_coeffand the final copy are removed.Peak construction memory drops from
2 * spl%coefftospl%coeff.The OpenACC path (
construct_batch_splines_3d_resident_device) and thestreaming
construct_batch_splines_3d_linesvariant are untouched.Verification
Before: a 16-marker guiding-centre run on the reactor chartmap peaked at 64.5 GB
during field loading and was OOM-killed when more than one ran concurrently.
After, the batch-spline tests that cover the changed routine pass (
fo test,gfortran):
The oracle tests compare coefficients against reference tables, so a bit-for-bit
match confirms the result is unchanged.
The 11 failing tests (
setup_chartmap_volume,test_chartmap_coordinates,test_vector_conversion, and the*_map2disc_*validate/plot cases) are thepre-existing map2disc fixture gap: the ctest interpreter is system python, which
lacks
map2disc. They fail identically onorigin/mainand are unrelated tothis change. (A uv-based fix for that is a separate PR.)