Skip to content

Fix/readme trainer torchrun#2787

Open
anravich13-cloud wants to merge 5 commits into
mainfrom
fix/readme-trainer-torchrun
Open

Fix/readme trainer torchrun#2787
anravich13-cloud wants to merge 5 commits into
mainfrom
fix/readme-trainer-torchrun

Conversation

@anravich13-cloud

@anravich13-cloud anravich13-cloud commented Jun 12, 2026

Copy link
Copy Markdown

Note

Low Risk
Default orchestrator behavior is unchanged; changes are README/debug-config plus an opt-out flag for standalone validation runs.

Overview
Fixes environment validation docs so the RL trainer step matches how prime_rl.entrypoints.trainer must be started (torchrun), and swaps the eval smoke test to vf-eval against a running inference server.

Adds wait_for_trainer on the orchestrator (default true). When false, update_dispatch_gate keeps dispatch open so configs/debug/orch.toml can finish max_steps without a trainer consuming weight updates. That debug config also sets zero_advantage post-batch filter to monitor mode so short completions that yield all-zero advantages do not abort the run.

Reviewed by Cursor Bugbot for commit 08a169a. Bugbot is set up for automated code reviews on this repo. Configure here.

The trainer console script is an internal entrypoint that expects
torchrun-provided env vars (RANK, WORLD_SIZE, etc.) and fails with
'RANK expected, but not set' when invoked directly. Update README
validation step 4 to launch it through torchrun.
…inating

The README orchestrator validation step could never complete:

1. The default-enforcing post-batch zero_advantage filter dropped all
   rollouts (16-token completions on Qwen3-0.6B all score 0 reward, so
   group advantages are uniformly zero), aborting after 10 empty batches.
2. With filters in monitor mode, the dispatch gate then paused forever
   waiting for a trainer weight broadcast that never comes when running
   the orchestrator standalone.

Add a wait_for_trainer config flag (default true, preserving existing
behavior); when false the dispatch gate stays open so a standalone
orchestrator ships all max_steps batches and exits cleanly. Set it in
configs/debug/orch.toml along with monitor-mode post-batch filters, and
note the standalone behavior in the README.
The `eval` console script doesn't exist and configs/debug/eval.toml was
removed in #1714, so README step 5.2 fails with 'Failed to spawn: eval'.
Point it at the verifiers CLI (vf-eval) against the running debug
inference server, matching the reverse-text example.
@anravich13-cloud anravich13-cloud requested review from S1ro1 and samsja June 12, 2026 18:46

@samsja samsja left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me why the wait_for_trainer is needed, I do understand what it does but where we need to use it

Comment thread README.md
4. Check that you can run the RL trainer (*this requires 1 GPU*)

```bash
uv run trainer @ configs/debug/rl/train.toml

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should work for single gpu no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it used to but maybe it failed now ?

Comment thread README.md Outdated
uv run orchestrator @ configs/debug/orch.toml
```

*This runs the orchestrator standalone (no trainer); the debug config sets `wait_for_trainer = false` so it completes 5 steps and exits cleanly on its own.*

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment thread configs/debug/orch.toml Outdated
Comment on lines +14 to +18
[[post_batch_filters]]
type = "gibberish"

[[post_batch_filters]]
type = "repetition"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove

@anravich13-cloud

anravich13-cloud commented Jun 13, 2026

Copy link
Copy Markdown
Author

not clear to me why the wait_for_trainer is needed, I do understand what it does but where we need to use it
@samsja

This is only needed for the readme step 5.1, which otherwise just hangs when there's no trainer - the dispatch gate pauses at TARGET_LAG waiting for a weight broadcast that never comes without a trainer — exactly what happens in README "Validate your environment setup" step 5.1 (uv run orchestrator @ configs/debug/orch.toml), which hangs after ~2 steps instead of finishing. wait_for_trainer = false keeps the gate open so the run ships all max_steps batches and exits 0.

Removed unnecessary post_batch_filters for 'gibberish' and 'repetition'.

Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>
Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants