Skip to content

fix: is_tt_moe_model checks text_config for VLM-style MoE models#2780

Open
Levichev wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
Levichev:fix/is-tt-moe-vlm-text-config
Open

fix: is_tt_moe_model checks text_config for VLM-style MoE models#2780
Levichev wants to merge 2 commits into
PrimeIntellect-ai:mainfrom
Levichev:fix/is-tt-moe-vlm-text-config

Conversation

@Levichev

@Levichev Levichev commented Jun 11, 2026

Copy link
Copy Markdown

VLM composite configs (e.g. Qwen3.5-MoE VL) nest LM fields like num_experts under config.text_config, so is_tt_moe_model returns False for MoE VLMs and silently disables the MoE branches in both trainers (max_vio / routing_confidence logging). Resolve text_config first — same pattern already used in model.py for num_hidden_layers.


Note

Low Risk
Single guard change in MoE detection; behavior for plain LM MoE models is unchanged when text_config is absent.

Overview
is_tt_moe_model now resolves config.text_config when present before checking for num_experts / n_routed_experts, matching the existing VLM pattern used for num_hidden_layers.

MoE vision-language models (e.g. Qwen3.5-MoE VL) store LM MoE fields on the nested text config, so the old top-level check returned false and RL/SFT trainers skipped MoE load-balance metrics (max_vio, routing_confidence) via get_load_balance_stats.

Reviewed by Cursor Bugbot for commit a730525. Bugbot is set up for automated code reviews on this repo. Configure here.

Levichev and others added 2 commits June 12, 2026 02:18
VLM-style model configs (e.g. qwen3_5_moe) nest MoE fields like
num_experts under config.text_config. Without this fix,
is_tt_moe_model returns False for these models, which disables
max_vio logging and any MoE-specific training code paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant