Fix/readme trainer torchrun by anravich13-cloud · Pull Request #2787 · PrimeIntellect-ai/prime-rl

anravich13-cloud · 2026-06-12T18:05:48Z

Note

Low Risk
Default orchestrator behavior is unchanged; changes are README/debug-config plus an opt-out flag for standalone validation runs.

Overview
Fixes environment validation docs so the RL trainer step matches how prime_rl.entrypoints.trainer must be started (torchrun), and swaps the eval smoke test to vf-eval against a running inference server.

Adds wait_for_trainer on the orchestrator (default true). When false, update_dispatch_gate keeps dispatch open so configs/debug/orch.toml can finish max_steps without a trainer consuming weight updates. That debug config also sets zero_advantage post-batch filter to monitor mode so short completions that yield all-zero advantages do not abort the run.

^{Reviewed by Cursor Bugbot for commit 08a169a. Bugbot is set up for automated code reviews on this repo. Configure here.}

The trainer console script is an internal entrypoint that expects torchrun-provided env vars (RANK, WORLD_SIZE, etc.) and fails with 'RANK expected, but not set' when invoked directly. Update README validation step 4 to launch it through torchrun.

…inating The README orchestrator validation step could never complete: 1. The default-enforcing post-batch zero_advantage filter dropped all rollouts (16-token completions on Qwen3-0.6B all score 0 reward, so group advantages are uniformly zero), aborting after 10 empty batches. 2. With filters in monitor mode, the dispatch gate then paused forever waiting for a trainer weight broadcast that never comes when running the orchestrator standalone. Add a wait_for_trainer config flag (default true, preserving existing behavior); when false the dispatch gate stays open so a standalone orchestrator ships all max_steps batches and exits cleanly. Set it in configs/debug/orch.toml along with monitor-mode post-batch filters, and note the standalone behavior in the README.

The `eval` console script doesn't exist and configs/debug/eval.toml was removed in #1714, so README step 5.2 fails with 'Failed to spawn: eval'. Point it at the verifiers CLI (vf-eval) against the running debug inference server, matching the reverse-text example.

samsja

not clear to me why the wait_for_trainer is needed, I do understand what it does but where we need to use it

mikasenghaas · 2026-06-12T20:21:13Z

 4. Check that you can run the RL trainer (*this requires 1 GPU*)

 ```bash
-uv run trainer @ configs/debug/rl/train.toml


this should work for single gpu no?

it used to but maybe it failed now ?

mikasenghaas · 2026-06-12T20:21:34Z

 uv run orchestrator @ configs/debug/orch.toml
 ```

+*This runs the orchestrator standalone (no trainer); the debug config sets `wait_for_trainer = false` so it completes 5 steps and exits cleanly on its own.*


mikasenghaas · 2026-06-12T20:21:58Z

+[[post_batch_filters]]
+type = "gibberish"
+
+[[post_batch_filters]]
+type = "repetition"


anravich13-cloud · 2026-06-13T01:18:39Z

not clear to me why the wait_for_trainer is needed, I do understand what it does but where we need to use it
@samsja

This is only needed for the readme step 5.1, which otherwise just hangs when there's no trainer - the dispatch gate pauses at TARGET_LAG waiting for a weight broadcast that never comes without a trainer — exactly what happens in README "Validate your environment setup" step 5.1 (uv run orchestrator @ configs/debug/orch.toml), which hangs after ~2 steps instead of finishing. wait_for_trainer = false keeps the gate open so the run ships all max_steps batches and exits 0.

Removed unnecessary post_batch_filters for 'gibberish' and 'repetition'. Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>

Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>

anravich13-cloud added 3 commits June 12, 2026 16:41

anravich13-cloud requested review from S1ro1 and samsja June 12, 2026 18:46

samsja reviewed Jun 12, 2026

View reviewed changes

mikasenghaas reviewed Jun 12, 2026

View reviewed changes

anravich13-cloud pushed a commit that referenced this pull request Jun 13, 2026

Merge pull request #2787 into repetition filter branch

62894cf

anravich13-cloud added 2 commits June 12, 2026 20:37

Clean up post_batch_filters in orch.toml

3bd111c

Removed unnecessary post_batch_filters for 'gibberish' and 'repetition'. Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>

Update README.md

08a169a

Signed-off-by: Anirudh Ravichandran <ani@primeintellect.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/readme trainer torchrun#2787

Fix/readme trainer torchrun#2787
anravich13-cloud wants to merge 5 commits into
mainfrom
fix/readme-trainer-torchrun

anravich13-cloud commented Jun 12, 2026 •

edited by cursor Bot

Loading

Uh oh!

samsja left a comment

Uh oh!

mikasenghaas Jun 12, 2026

Uh oh!

samsja Jun 12, 2026

Uh oh!

mikasenghaas Jun 12, 2026

Uh oh!

mikasenghaas Jun 12, 2026

Uh oh!

anravich13-cloud commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anravich13-cloud commented Jun 12, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samsja left a comment

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

samsja Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

anravich13-cloud commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anravich13-cloud commented Jun 12, 2026 •

edited by cursor Bot

Loading

anravich13-cloud commented Jun 13, 2026 •

edited

Loading