Support PyTorch 2.12 (fx codegen, autograd_function_apply) by bhimrazy · Pull Request #2825 · Lightning-AI/lightning-thunder

bhimrazy · 2026-05-29T17:29:43Z

Before submitting

Was this discussed/approved via a Github issue? (related to torch-nightly compat, e.g. Remove xfails once PyTorch autograd stabilizes #2807)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs? (no docs changes needed)
Did you write any new necessary tests?

What does this PR do?

Restores PyTorch 2.12 compatibility. On torch >= 2.12 the core CI suites go red (latest/nightly) while oldest stays green — affecting all test_nanogpt_*_DynamoThunder_*, ~40 test_dynamo.py cases, test_autograd_function_apply*, and test_higher_order_inplace_alias_update.

Root cause: tracing the dynamo GraphModule makes thunder interpret torch's fx codegen and autograd_function_apply, both of which changed internals in 2.12.

Interpreter (thunder/core/interpreter.py)

generator has no len() — fx unpacks a generator as *args; materialize it before indexing.
NotImplementedError: Sequence.insert — implement the stubbed list.insert.
MappingKeysView has no len() — add __len__ to the mapping views.

autograd_function_apply (thunder/core/jit_ext.py, thunder/torch/__init__.py)

KeyError: 'saved_for_backward_idx' — accept (and ignore) the new kwarg.
trying to set ._code … — fx recompile() writes _code/_graph into __dict__; stop recording non-trackable module-member writes.
wrong shape ([] vs [2]) — dynamo wraps the output in a tuple and indexes it (…apply(...)[0]); preserve the output structure instead of collapsing it.

Cleanup: drop the unused torchaudio dev dependency.

All fixes are version-agnostic (no version gating): inputs are just handled more generally, the new kwarg is optional, and a bare output stays bare while a tuple stays a tuple. oldest/latest/nightly CI covers both ends.

Tests: new focused interpreter tests (generator unpack, list.insert, mapping-view len), each verified to fail pre-fix; autograd tests extended for saved_for_backward_idx (mirroring _detect_has_args_tensor_mask).

CI: ❌ before → ✅ after.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

torch 2.12 changed fx code generation in ways the interpreter traces through when thunder compiles a GraphModule: - CALL_FUNCTION_EX may unpack a generator as *args (e.g. fx emits `"{}".format(*(_get_repr(a) for a in node.args))`). wrap_args_from_list called len() on it, raising "object of type 'generator' has no len()". Materialize non-sequence iterables via interpreted iteration before indexing. - fx codegen calls list.insert (e.g. `free_vars.insert(0, "self")`), which was an unimplemented Sequence.insert stub. Implement it following the append/pop item-wrapper pattern. - MappingKeysView / MappingValuesWrapper / MappingItemsWrapper lacked __len__, so len() on a dict view raised during tracing. Delegate to the mapping. Adds focused interpreter tests for each pattern.

torch 2.12 changed torch.ops.higher_order.autograd_function_apply and the GraphModule dynamo produces for it: - A new required `saved_for_backward_idx` kwarg. Accept (and ignore) it on the thunder symbol and augmented forward for API parity; supply it in the tests (mirroring the existing _detect_has_args_tensor_mask pattern). - fx GraphModule recompile() (triggered while interpreting the module) writes framework-internal state -- `_code` (a source string) and `_graph` (an fx.Graph) -- into the module __dict__. These were recorded as module-member modifications and corrupted prologue/epilogue provenance unpacking. Only record writes whose value is trackable computational state. - dynamo now wraps the forward output in a tuple and indexes it (`autograd_function_apply(...)[0]`). The lookaside unpacked a single-element output, collapsing the tuple, so the index then sliced into the tensor and produced a wrong (scalar) result. Return the output preserving its structure.

torchaudio is not imported anywhere in the codebase.

The temporary xfail added in Lightning-AI#2805 (tracked by Lightning-AI#2807) is no longer needed: the preceding commits make test_splitter_autograd_function pass on torch 2.12 (it was xfailing, now xpasses). Drop the marker, its import, and the now-unused _pytorch_removed_args_tensor_mask / xfail_if_args_tensor_mask_removed helpers. Closes Lightning-AI#2807.

bhimrazy added 3 commits May 29, 2026 23:10

Remove unused torchaudio dev dependency

d46f075

torchaudio is not imported anywhere in the codebase.

bhimrazy requested a review from mruberry as a code owner May 29, 2026 17:29

github-actions Bot added the dependencies label May 29, 2026

bhimrazy marked this pull request as draft May 29, 2026 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PyTorch 2.12 (fx codegen, autograd_function_apply)#2825

Support PyTorch 2.12 (fx codegen, autograd_function_apply)#2825
bhimrazy wants to merge 4 commits into
Lightning-AI:mainfrom
bhimrazy:fix/torch-2.12-compat

bhimrazy commented May 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bhimrazy commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Did you have fun?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bhimrazy commented May 29, 2026 •

edited

Loading