fix: is_tt_moe_model checks text_config for VLM-style MoE models#2780
Open
Levichev wants to merge 2 commits into
Open
fix: is_tt_moe_model checks text_config for VLM-style MoE models#2780Levichev wants to merge 2 commits into
Levichev wants to merge 2 commits into
Conversation
VLM-style model configs (e.g. qwen3_5_moe) nest MoE fields like num_experts under config.text_config. Without this fix, is_tt_moe_model returns False for these models, which disables max_vio logging and any MoE-specific training code paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
VLM composite configs (e.g. Qwen3.5-MoE VL) nest LM fields like num_experts under config.text_config, so is_tt_moe_model returns False for MoE VLMs and silently disables the MoE branches in both trainers (max_vio / routing_confidence logging). Resolve text_config first — same pattern already used in model.py for num_hidden_layers.
Note
Low Risk
Single guard change in MoE detection; behavior for plain LM MoE models is unchanged when
text_configis absent.Overview
is_tt_moe_modelnow resolvesconfig.text_configwhen present before checking fornum_experts/n_routed_experts, matching the existing VLM pattern used fornum_hidden_layers.MoE vision-language models (e.g. Qwen3.5-MoE VL) store LM MoE fields on the nested text config, so the old top-level check returned false and RL/SFT trainers skipped MoE load-balance metrics (
max_vio,routing_confidence) viaget_load_balance_stats.Reviewed by Cursor Bugbot for commit a730525. Bugbot is set up for automated code reviews on this repo. Configure here.