Skip to content

fix(gdn): only reject THD packed_seq_params, allow BSHD#52

Merged
Zhichenzzz merged 1 commit into
fix/1293-te-fp8-bytesiofrom
fix/1292-gdn-bshd-packed
Jun 19, 2026
Merged

fix(gdn): only reject THD packed_seq_params, allow BSHD#52
Zhichenzzz merged 1 commit into
fix/1293-te-fp8-bytesiofrom
fix/1292-gdn-bshd-packed

Conversation

@Zhichenzzz

Copy link
Copy Markdown

Problem

GatedDeltaNet.forward blanket-rejects any non-None packed_seq_params, but Megatron passes a PackedSeqParams object to every attention module uniformly — even in BSHD mode (qkv_format="bshd"). This crashes Qwen3.5/3.6 GDN with NotImplementedError: GDN does not support packed sequence for now.

Fix

Guard on packed_seq_params.qkv_format == "thd" specifically, matching upstream NVIDIA/Megatron-LM PR NVIDIA#2645. BSHD now proceeds (GDN ignores the params and computes seq_len from shapes); genuine THD packing still raises (full THD support is a follow-up port of PR NVIDIA#2645).

Validation

Built the real GatedDeltaNet on GPU and exercised the guard with real PackedSeqParams:

  • before: BSHD → GDN does not support packed sequence
  • after: BSHD passes the guard and completes a full forward; THD still raises.

Fixes radixark/miles#1292

@Zhichenzzz Zhichenzzz force-pushed the fix/1292-gdn-bshd-packed branch from 8b0dd99 to 2dc2612 Compare June 8, 2026 21:00
Megatron passes a bshd-format PackedSeqParams uniformly even when no real
packing is happening, which GDN can safely ignore. Restrict the
NotImplementedError to genuine THD packing instead of any packed_seq_params,
so GDN runs under the default bshd path.
@Zhichenzzz Zhichenzzz force-pushed the fix/1292-gdn-bshd-packed branch from c2ab1aa to fc7ae0e Compare June 18, 2026 21:39
@Zhichenzzz Zhichenzzz changed the base branch from miles-main to fix/1293-te-fp8-bytesio June 18, 2026 21:40
@Zhichenzzz Zhichenzzz merged commit a660935 into fix/1293-te-fp8-bytesio Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GDN rejects packed_seq_params in BSHD mode — upstream fixed in Megatron-LM PR #2645, missed by fork sync

1 participant