plan_vit: add the muP / scaling-study ViT as a torchtitan experiment#10
Open
utkarshgill wants to merge 10 commits into
Open
plan_vit: add the muP / scaling-study ViT as a torchtitan experiment#10utkarshgill wants to merge 10 commits into
utkarshgill wants to merge 10 commits into
Conversation
Self-contained plan ViT for the prune-10m muP and scaling study, mirroring path's structure: model + config_registry (standard and muP flavors, n_embd 128..2048 at head_dim 64) + a thin trainer. Two cameras are channel-stacked into in_channels=24, matching the production worldmodel I/O. Registered as the "plan_vit" experiment so it launches like path: run.sh torchtitan/run_train.sh -e MODULE=plan_vit -e CONFIG=plan_vit_mup_w512
the config hardcoded dp_shard=8 (only valid at world_size=8, i.e. N=1). launching N=2 (world_size=16) tripped the parallel-dims assertion at startup. derive replicate=num_nodes, shard=local_world_size from env like path does.
A pinned total_steps wrapped the cosine schedule, making the LR oscillate when training.steps exceeded it. None falls back to the real training_steps.
…ase lr) output_mult=1 made the coord check flat for the wrong reason (compensating errors that cancel only at low step count). Canonical muP readout: forward multiplier 1/m, base-width init, base lr (vector-like under Adam, ninf==1). Verified by coord check: init output slopes ~1/sqrt(m), trained output flat.
adds default-off plan_target_last_frame flag so the single-frame ViT supervises the last plan frame; convnext unchanged when off
# Conflicts: # torchtitan/experiments/__init__.py
drop the Meta copyright headers, the inherited-Meta formatter churn, and the dangling plan_vit registry entry; the vit resolves via --module path --config vit_*.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Plan ViT for the prune-10m muP and scaling study, mirroring path's structure: model + config_registry (standard and muP flavors, n_embd 128..2048 at head_dim 64) + a thin trainer. Two cameras are channel-stacked into in_channels=24, matching the production worldmodel I/O.
Registered as the "plan_vit" experiment so it launches like path:
run.sh torchtitan/run_train.sh -e MODULE=plan_vit -e CONFIG=plan_vit_mup_w512