Skip to content

fix(megatron): only build the MTP loss mask when MTP is enabled#2876

Open
yfw wants to merge 2 commits into
mainfrom
yifu/fix-mtp-loss-mask-gate
Open

fix(megatron): only build the MTP loss mask when MTP is enabled#2876
yfw wants to merge 2 commits into
mainfrom
yifu/fix-mtp-loss-mask-gate

Conversation

@yfw

@yfw yfw commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Gate mtp_loss_mask creation on mtp_num_layers, and add a regression test for the VLM sequence-packing guard.

Issues

Closes #2869

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Gate mtp_loss_mask creation on mtp_num_layers, and add a regression
test for the VLM sequence-packing guard.

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@yfw yfw requested review from a team as code owners June 19, 2026 19:48
@copy-pr-bot

copy-pr-bot Bot commented Jun 19, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yfw yfw added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Jun 22, 2026
@yfw

yfw commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

/ok to test b3884ed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] MTP loss mask trips "MTP not supported with VLM sequence packing" assertion on non-MTP Qwen3.5 megatron recipes

1 participant