interpolate: build batch spline 3D coeffs in place to halve construction peak memory by krystophny · Pull Request #341 · itpplasma/libneo

krystophny · 2026-06-25T07:28:25Z

What

construct_batch_splines_3d_legacy (the non-OpenACC CPU path) held temp_coeff,
a full-size copy of the coefficient array, alongside spl%coeff for the whole
build. temp_coeff has shape (0:N1,0:N2,0:N3,n1,n2,n3), equal to spl%coeff
per quantity, so construction peaked at twice the final spline size.

For the reactor chartmap (rho=501, theta=55, zeta=300, order 5) one quantity
is ~24 GB, so the build held ~48 GB and pushed the chartmap field-load phase to
a 64.5 GB resident peak, which the OOM-killer terminated (simple.x,
anon-rss 7.4 GB mid-climb, total-vm 72 GB).

This builds each quantity directly into spl%coeff(iq,...). The spl_reg/
spl_per call sequence is unchanged, so the coefficients are bit-identical;
only the destination array changes. temp_coeff and the final copy are removed.
Peak construction memory drops from 2 * spl%coeff to spl%coeff.

The OpenACC path (construct_batch_splines_3d_resident_device) and the
streaming construct_batch_splines_3d_lines variant are untouched.

Verification

Before: a 16-marker guiding-centre run on the reactor chartmap peaked at 64.5 GB
during field loading and was OOM-killed when more than one ran concurrently.

After, the batch-spline tests that cover the changed routine pass (fo test,
gfortran):

20/85 Test #20: test_batch_interpolate_der3 ......  Passed
24/85 Test  #6: test_batch_eval_oracle ...........  Passed
32/85 Test #14: test_spl_three_to_five ...........  Passed
37/85 Test #22: test_batch_interpolate_rmix ......  Passed
39/85 Test #23: test_batch_interpolate_oracle ....  Passed
 8/85 Test #25: test_spline_performance ..........  Passed

The oracle tests compare coefficients against reference tables, so a bit-for-bit
match confirms the result is unchanged.

The 11 failing tests (setup_chartmap_volume, test_chartmap_coordinates,
test_vector_conversion, and the *_map2disc_* validate/plot cases) are the
pre-existing map2disc fixture gap: the ctest interpreter is system python, which
lacks map2disc. They fail identically on origin/main and are unrelated to
this change. (A uv-based fix for that is a separate PR.)

…ion peak memory construct_batch_splines_3d_legacy held temp_coeff (full coefficient-array size, ~24 GB for the reactor chartmap rho=501,theta=55,zeta=300, order 5) alongside the equal-size spl%coeff for the whole build, doubling peak memory and causing OOM during chartmap field loading. Build each quantity directly into spl%coeff(iq,...); the spl_reg/spl_per sequence is unchanged so coefficients are bit-identical.

krystophny temporarily deployed to github-pages June 25, 2026 07:36 — with GitHub Actions Inactive

krystophny merged commit 04e0e22 into main Jun 25, 2026
4 checks passed

krystophny deleted the perf/batch-spline-inplace-coeff branch June 25, 2026 07:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

interpolate: build batch spline 3D coeffs in place to halve construction peak memory#341

interpolate: build batch spline 3D coeffs in place to halve construction peak memory#341
krystophny merged 1 commit into
mainfrom
perf/batch-spline-inplace-coeff

krystophny commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

krystophny commented Jun 25, 2026

What

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant