mup: model-agnostic muP sweep/scale routine by utkarshgill · Pull Request #12 · commaai/torchtitan

utkarshgill · 2026-06-23T20:53:37Z

torchtitan experiment that drives a muP learning-rate sweep for any model and reports the transferred lr plus a width-scaling loss predictor, so the routine lives in the torchtitan code path instead of a per-user project script.

spec.py: MuPSweepSpec (config-name and training-id schemes, per-user report_dir) and SPECS for plan_vit (ready), convnext and fastvit (ready=False until their muP configs land).
routine.py: collect final losses from reporterv2, hp_table (the transferred lr), fit_predictor (loss(w) = L_inf + A*w^-alpha), build_report (plain plotly html, no project-specific infra).
main.py: python -m torchtitan.experiments.mup grid <model> prints the launch grid for any launcher to submit; report <model> collects and writes the report and prints the transferred lr plus the predicted loss.

report_dir defaults per-user (getpass) so this is not bound to one report mount; override with MUP_REPORT_DIR. submission stays the caller's job, no cluster coupling.

A torchtitan experiment that drives a muP learning-rate sweep for any model and reports the transferred lr plus a width-scaling loss predictor, so the routine lives in the torchtitan code path instead of a per-user project script. - spec.py: MuPSweepSpec (config-name and training-id schemes, per-user report_dir) and SPECS for plan_vit (ready), convnext and fastvit (ready=False until their muP configs land). - routine.py: collect final losses from reporterv2, hp_table (the transferred lr), fit_predictor (loss(w) = L_inf + A*w^-alpha), build_report (plain plotly html, no project-specific infra). - __main__.py: `python -m torchtitan.experiments.mup grid <model>` prints the launch grid for any launcher to submit; `report <model>` collects and writes the report and prints the transferred lr plus the predicted loss. report_dir defaults per-user (getpass) so this is not bound to one report mount; override with MUP_REPORT_DIR. submission stays the caller's job, no cluster coupling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mup: model-agnostic muP sweep/scale routine#12

mup: model-agnostic muP sweep/scale routine#12
utkarshgill wants to merge 1 commit into
commaai:mainfrom
utkarshgill:mup-routine

utkarshgill commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

utkarshgill commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant