Skip to content

Sparse pullback for big performance gain#2170

Open
unalmis wants to merge 87 commits into
masterfrom
ku/sparse_pullback
Open

Sparse pullback for big performance gain#2170
unalmis wants to merge 87 commits into
masterfrom
ku/sparse_pullback

Conversation

@unalmis
Copy link
Copy Markdown
Collaborator

@unalmis unalmis commented Apr 17, 2026

PR Age

  • Resolves Sparsity preserving pullbacks #2168 .
    • Yields factor of $N \times$ improvement in memory and compute speed to get Jacobian $R^{N \times M}$ of map $f \colon R^M \to R^N$. Basically if you had some expensive function, with proper application throughout you could get Jacobian for around the cost you'd get a single row before.
    • $N^2$ for check pointed funs
    • Even if you are only doing a single vjp, it is much faster and more memory effecient due to reasons discussed in links threads.
    • could do sparse lin alg on the cotangent update too, but i didn't make assumption that cotangent is sparse
    • Functions added are sparse_pullback and sparse_pullback_map
  • Removes
    • bounce1d optimization.
    • Unneeded flags (is_reshaped, is_fourier) that users said were confusing (backwards compatible) as well as the developer flags Bref, Lref that should not be there.
  • Previously, pitch_batch_size was getting ignored. This fixes that by adding strip_dim0 flag to batch_map.
  • Switch resolution to per field period to simplify use and analysis #2182
  • Resolves the fixme comment so that gradients are consistent #2185

notes

  • They are decorator functions, so it is non-invasive and can improves many objective such as bounce, balloon, qs, omni, surface integral, free surface etc, but more generally can apply to any atomic computation. Trivial to use now that I have done all the math and resolved implementation sharp bits.
  • for example the 4d singular integral kernel can be reduced to 2d pullback
  • For this PR, I only apply the decorator at one point in bounce. Because itsdone at intermediate point I needed to change some code. Note the CI benchmarks won't show that improvement since those objectives were onto $R^1$, but benchmarking with tensor-board indeed shows the factor of $N$ improvement in speed and memory computing the Jacobian. And this is not yet applied to the full pipeline. See Single contangent pullback through compute pipeline #2171 .
  • Unrelated to this PR since it's always been like this, but computing the proximal projection Jacobian with Force balance constraint is more expensive that computing the bigger unconstrained Jacobian since the svd solve is way more expensive now than computing a vjp of bounce integrals. Perhaps the decorator could be applied to stuff there; I didn't look.
  • If the partial summation in Poloidal FFT Implementation #1508 is done so that the the transforms are factorized then we can extend the pullback wrappers to include the spectral to real space transform on each surface to by decorating the larger function with a sparse_pullback as well. Or use Fourier-Chebysev as i had suggested in Making transforms more efficient #1243 to make it simpler. This would have thr advantage of further reducing memory.
  • This should renew interest in Upsample data above midplane to full grid assuming stellarator symmetry #1206 and Compute function and gradient simultaneously for reverse mode #1872 and Generalize toroidal angle beyond phi cylindrical #465.

@unalmis unalmis requested review from ddudt and f0uriest May 4, 2026 20:32
@unalmis
Copy link
Copy Markdown
Collaborator Author

unalmis commented May 7, 2026

Please just review desc batching.py and the derivatives.py then.

Yea so those are the only things that need reviewing before approval which should take like 10 min. Other stuff is just stuff reviewers requested on #2157 and #2147.

@unalmis unalmis requested review from ddudt, dpanici, f0uriest and rahulgaur104 and removed request for ddudt, dpanici, f0uriest and rahulgaur104 May 9, 2026 19:37
@unalmis unalmis added the P∞ P_infty. Ready to merge > 1 years. Top priority to merge to prevent further delay of research. label May 9, 2026
@unalmis
Copy link
Copy Markdown
Collaborator Author

unalmis commented May 15, 2026

when is this getting merged

@unalmis unalmis mentioned this pull request May 16, 2026
Copy link
Copy Markdown
Member

@f0uriest f0uriest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still about 1/2 to go but leaving these here for now.

One big point is that if we're messing with custom AD stuff I think it would be good practice to add tests comparing AD of the relevant objectives to finite differences (can use very low res, don't care about physics convergence), both to check that the implementation is correct and also to guard against us accidentally applying the sparse pullback in places where its not strictly correct.

Comment thread desc/compute/_old.py
Comment thread tests/baseline/test_plot_gammac.png
Comment thread CHANGELOG.md Outdated
Comment thread CHANGELOG.md
Comment thread desc/objectives/_fast_ion.py
Comment thread desc/objectives/_neoclassical.py
Comment thread desc/batching.py
Comment thread desc/derivatives.py
Comment thread desc/derivatives.py
Comment thread desc/derivatives.py
@unalmis
Copy link
Copy Markdown
Collaborator Author

unalmis commented May 17, 2026

Still about 1/2 to go but leaving these here for now.

One big point is that if we're messing with custom AD stuff I think it would be good practice to add tests comparing AD of the relevant objectives to finite differences (can use very low res, don't care about physics convergence), both to check that the implementation is correct and also to guard against us accidentally applying the sparse pullback in places where its not strictly correct.

  • Locally I tested with the jacobian equivalent of test_compute_everything. It worked to machine precision.
  • I tested finite difference a bunch for these objectives, and it works well.
  • I prefer to keep the tests I add as independent unit tests that test correctness of code.
  • There are enough monolithic smoke tests. Part of the reason theses pr's are big is because I have to plumb through changes for the 100 different tests I have for these objectives. Adding more smoke tests makes it harder to maintain, annoys reviewers for pr leght, and my hunch is someone would eventually decide to delete for Reduce testing time/memory #914

(can use very low res, don't care about physics convergence),

That is not true/possible. See the supplementary information in publications. Briefly, For nontrivial computational problems where not everything is C^infinty, an algorithm to solve a problem needs to have amazing convergence properties, and be robust to topology changes, for the duscretization error to be correlated enough nearby a given point in the optimization space for the finite difference derivative to have any chance if being accurate. (Again explained better in the pdf).

You can see that finite difference derivatives only make sense at high resolution computations of the algorithm. Auto diff makes sense at any resolution because it estimates the derivative from only information at a single point in the optimization landscape. (Of course if discretization error is high then over an optimization it could still stall as varying discretization error can affect the decent direction,but that's unrelated for this discussion). In general you'll need high res to get finite diff to match auto diff.

@unalmis unalmis requested a review from f0uriest May 18, 2026 05:25
@unalmis unalmis requested review from ddudt, dpanici, f0uriest and rahulgaur104 and removed request for ddudt, dpanici, f0uriest and rahulgaur104 May 27, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AD related to automatic differentation P∞ P_infty. Ready to merge > 1 years. Top priority to merge to prevent further delay of research. performance New feature or request to make the code faster stable Besides merging master, other updates require a child PR that should be merged to master later. theory Requires theory work before coding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sparsity preserving pullbacks

3 participants