Skip to content

interpolate: build batch spline 3D coeffs in place to halve construction peak memory#341

Merged
krystophny merged 1 commit into
mainfrom
perf/batch-spline-inplace-coeff
Jun 25, 2026
Merged

interpolate: build batch spline 3D coeffs in place to halve construction peak memory#341
krystophny merged 1 commit into
mainfrom
perf/batch-spline-inplace-coeff

Conversation

@krystophny

Copy link
Copy Markdown
Member

What

construct_batch_splines_3d_legacy (the non-OpenACC CPU path) held temp_coeff,
a full-size copy of the coefficient array, alongside spl%coeff for the whole
build. temp_coeff has shape (0:N1,0:N2,0:N3,n1,n2,n3), equal to spl%coeff
per quantity, so construction peaked at twice the final spline size.

For the reactor chartmap (rho=501, theta=55, zeta=300, order 5) one quantity
is ~24 GB, so the build held ~48 GB and pushed the chartmap field-load phase to
a 64.5 GB resident peak, which the OOM-killer terminated (simple.x,
anon-rss 7.4 GB mid-climb, total-vm 72 GB).

This builds each quantity directly into spl%coeff(iq,...). The spl_reg/
spl_per call sequence is unchanged, so the coefficients are bit-identical;
only the destination array changes. temp_coeff and the final copy are removed.
Peak construction memory drops from 2 * spl%coeff to spl%coeff.

The OpenACC path (construct_batch_splines_3d_resident_device) and the
streaming construct_batch_splines_3d_lines variant are untouched.

Verification

Before: a 16-marker guiding-centre run on the reactor chartmap peaked at 64.5 GB
during field loading and was OOM-killed when more than one ran concurrently.

After, the batch-spline tests that cover the changed routine pass (fo test,
gfortran):

20/85 Test #20: test_batch_interpolate_der3 ......  Passed
24/85 Test  #6: test_batch_eval_oracle ...........  Passed
32/85 Test #14: test_spl_three_to_five ...........  Passed
37/85 Test #22: test_batch_interpolate_rmix ......  Passed
39/85 Test #23: test_batch_interpolate_oracle ....  Passed
 8/85 Test #25: test_spline_performance ..........  Passed

The oracle tests compare coefficients against reference tables, so a bit-for-bit
match confirms the result is unchanged.

The 11 failing tests (setup_chartmap_volume, test_chartmap_coordinates,
test_vector_conversion, and the *_map2disc_* validate/plot cases) are the
pre-existing map2disc fixture gap: the ctest interpreter is system python, which
lacks map2disc. They fail identically on origin/main and are unrelated to
this change. (A uv-based fix for that is a separate PR.)

…ion peak memory

construct_batch_splines_3d_legacy held temp_coeff (full coefficient-array size,
~24 GB for the reactor chartmap rho=501,theta=55,zeta=300, order 5) alongside the
equal-size spl%coeff for the whole build, doubling peak memory and causing OOM
during chartmap field loading. Build each quantity directly into spl%coeff(iq,...);
the spl_reg/spl_per sequence is unchanged so coefficients are bit-identical.
@krystophny krystophny merged commit 04e0e22 into main Jun 25, 2026
4 checks passed
@krystophny krystophny deleted the perf/batch-spline-inplace-coeff branch June 25, 2026 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant