mup: model-agnostic muP sweep/scale routine#12
Draft
utkarshgill wants to merge 1 commit into
Draft
Conversation
A torchtitan experiment that drives a muP learning-rate sweep for any model and reports the transferred lr plus a width-scaling loss predictor, so the routine lives in the torchtitan code path instead of a per-user project script. - spec.py: MuPSweepSpec (config-name and training-id schemes, per-user report_dir) and SPECS for plan_vit (ready), convnext and fastvit (ready=False until their muP configs land). - routine.py: collect final losses from reporterv2, hp_table (the transferred lr), fit_predictor (loss(w) = L_inf + A*w^-alpha), build_report (plain plotly html, no project-specific infra). - __main__.py: `python -m torchtitan.experiments.mup grid <model>` prints the launch grid for any launcher to submit; `report <model>` collects and writes the report and prints the transferred lr plus the predicted loss. report_dir defaults per-user (getpass) so this is not bound to one report mount; override with MUP_REPORT_DIR. submission stays the caller's job, no cluster coupling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
torchtitan experiment that drives a muP learning-rate sweep for any model and reports the transferred lr plus a width-scaling loss predictor, so the routine lives in the torchtitan code path instead of a per-user project script.
python -m torchtitan.experiments.mup grid <model>prints the launch grid for any launcher to submit;report <model>collects and writes the report and prints the transferred lr plus the predicted loss.report_dir defaults per-user (getpass) so this is not bound to one report mount; override with MUP_REPORT_DIR. submission stays the caller's job, no cluster coupling.