Fix/readme trainer torchrun#2787
Conversation
The trainer console script is an internal entrypoint that expects torchrun-provided env vars (RANK, WORLD_SIZE, etc.) and fails with 'RANK expected, but not set' when invoked directly. Update README validation step 4 to launch it through torchrun.
…inating The README orchestrator validation step could never complete: 1. The default-enforcing post-batch zero_advantage filter dropped all rollouts (16-token completions on Qwen3-0.6B all score 0 reward, so group advantages are uniformly zero), aborting after 10 empty batches. 2. With filters in monitor mode, the dispatch gate then paused forever waiting for a trainer weight broadcast that never comes when running the orchestrator standalone. Add a wait_for_trainer config flag (default true, preserving existing behavior); when false the dispatch gate stays open so a standalone orchestrator ships all max_steps batches and exits cleanly. Set it in configs/debug/orch.toml along with monitor-mode post-batch filters, and note the standalone behavior in the README.
The `eval` console script doesn't exist and configs/debug/eval.toml was removed in #1714, so README step 5.2 fails with 'Failed to spawn: eval'. Point it at the verifiers CLI (vf-eval) against the running debug inference server, matching the reverse-text example.
samsja
left a comment
There was a problem hiding this comment.
not clear to me why the wait_for_trainer is needed, I do understand what it does but where we need to use it
| 4. Check that you can run the RL trainer (*this requires 1 GPU*) | ||
|
|
||
| ```bash | ||
| uv run trainer @ configs/debug/rl/train.toml |
There was a problem hiding this comment.
this should work for single gpu no?
There was a problem hiding this comment.
it used to but maybe it failed now ?
| uv run orchestrator @ configs/debug/orch.toml | ||
| ``` | ||
|
|
||
| *This runs the orchestrator standalone (no trainer); the debug config sets `wait_for_trainer = false` so it completes 5 steps and exits cleanly on its own.* |
| [[post_batch_filters]] | ||
| type = "gibberish" | ||
|
|
||
| [[post_batch_filters]] | ||
| type = "repetition" |
This is only needed for the readme step 5.1, which otherwise just hangs when there's no trainer - the dispatch gate pauses at TARGET_LAG waiting for a weight broadcast that never comes without a trainer — exactly what happens in README "Validate your environment setup" step 5.1 (uv run orchestrator @ configs/debug/orch.toml), which hangs after ~2 steps instead of finishing. wait_for_trainer = false keeps the gate open so the run ships all max_steps batches and exits 0. |
Removed unnecessary post_batch_filters for 'gibberish' and 'repetition'. Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>
Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>
Note
Low Risk
Default orchestrator behavior is unchanged; changes are README/debug-config plus an opt-out flag for standalone validation runs.
Overview
Fixes environment validation docs so the RL trainer step matches how
prime_rl.entrypoints.trainermust be started (torchrun), and swaps the eval smoke test tovf-evalagainst a running inference server.Adds
wait_for_traineron the orchestrator (default true). When false,update_dispatch_gatekeeps dispatch open soconfigs/debug/orch.tomlcan finishmax_stepswithout a trainer consuming weight updates. That debug config also setszero_advantagepost-batch filter to monitor mode so short completions that yield all-zero advantages do not abort the run.Reviewed by Cursor Bugbot for commit 08a169a. Bugbot is set up for automated code reviews on this repo. Configure here.