[FIX] model provider compatibility with Megatron-LM #1500

simondong1 · 2026-01-27T09:58:42Z

Issue

When using recent Megatron-LM versions (since Dec 10, 2025, NVIDIA/Megatron-LM#2608), model initialization fails with unexpected keyword argument 'pg_collection' or config because the upstream megatron get_model interface now passes these arguments explicitly. This causes convert_hf_to_torch_dist.py to fail.

from megatron

def get_model(
    model_provider_func,
    model_type=ModelType.encoder_or_decoder,
    wrap_with_ddp=True,
    config=None, # <--- NEW PARAMETER
    pg_collection=None # <--- NEW PARAMETER
):

Fix

Custom Providers: Updated wrapped_model_provider to dynamically inspect signatures and only pass pg_collection, vp_stage, or config if the custom function accepts them.
Standard Provider: Updated model_provider to accept pg_collection and config and forward pg_collection to GPTModel kwargs for correct distributed initialization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] model provider compatibility with Megatron-LM #1500

[FIX] model provider compatibility with Megatron-LM #1500

simondong1 commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[FIX] model provider compatibility with Megatron-LM #1500

Are you sure you want to change the base?

[FIX] model provider compatibility with Megatron-LM #1500

Conversation

simondong1 commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Fix

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

simondong1 commented Jan 27, 2026 •

edited

Loading