Skip to content

[codex] add fake R3 orchestrator debug mode#2801

Open
samsja wants to merge 1 commit into
mainfrom
codex/orchestrator-debug-fake-r3
Open

[codex] add fake R3 orchestrator debug mode#2801
samsja wants to merge 1 commit into
mainfrom
codex/orchestrator-debug-fake-r3

Conversation

@samsja

@samsja samsja commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

Adds a reusable orchestrator debug mode for memory and scheduling ablations without requiring real inference or trainer processes.

Memory reductions are split into the stacked follow-up PR: #2807.

What changed

  • Adds debug.no_inference, debug.no_trainer, debug.fake_tokenizer, and debug.log_memory orchestrator flags.
  • Lets rl local run the full orchestrator without spawning inference or trainer subprocesses when debug flags are enabled.
  • Adds no-op inference, training transport, and weight watcher implementations for debug runs.
  • Adds tests/debug_envs/fake_r3_trajectory, a deterministic fake environment that emits heavy pre-tokenized multi-turn trajectories with optional GLM5-shaped R3 payloads.
  • Keeps local ablation TOML configs out of git; the committed reusable artifact is the debug environment plus orchestrator debug plumbing.

Fake R3 environment

The fake env is user-configurable through env args:

  • turns
  • seq_len
  • prompt_len
  • completion_fraction
  • include_r3
  • routed_layers
  • routed_topk
  • n_routed_experts
  • num_examples
  • vocab_size

Defaults model the GLM5 R3 shape: 78 routed layers, top-k 8, 256 routed experts.

Validation

Validated earlier on this branch with orchestrator unit coverage and fake-R3 orchestrator stress runs. Memory optimizations are intentionally split into #2807.

@samsja samsja force-pushed the codex/orchestrator-debug-fake-r3 branch 3 times, most recently from 9c67853 to 56440cc Compare June 13, 2026 22:41
@samsja samsja marked this pull request as ready for review June 13, 2026 22:44

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 56440cc. Configure here.

Comment thread src/prime_rl/orchestrator/orchestrator.py
@samsja samsja force-pushed the codex/orchestrator-debug-fake-r3 branch from 56440cc to 80b0a66 Compare June 13, 2026 22:47
@samsja samsja changed the title [codex] add orchestrator fake R3 debug mode [codex] add fake R3 debug mode and prune train traces Jun 14, 2026
@samsja samsja force-pushed the codex/orchestrator-debug-fake-r3 branch from 88b9f20 to 80b0a66 Compare June 14, 2026 01:13
@samsja samsja changed the title [codex] add fake R3 debug mode and prune train traces [codex] add fake R3 orchestrator debug mode Jun 14, 2026
samsja added a commit that referenced this pull request Jun 14, 2026
…ed on #2801)

Prunes heavy raw train-trajectory token/R3 payloads after samples, advantages,
and pre-batch filters have run (keeping lightweight summaries for logging),
adds compact scalar completion_temperature, releases sent train-batch and
finalized rollout references, and calls trim_process_memory() after each batch
so freed glibc heap returns to the OS. Includes the fake-R3 orchestrator debug
harness (#2801) it is stacked on.

Squashed from codex/orchestrator-r3-memory.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
samsja added a commit that referenced this pull request Jun 14, 2026
…ed on #2801)

Prunes heavy raw train-trajectory token/R3 payloads after samples, advantages,
and pre-batch filters have run (keeping lightweight summaries for logging),
adds compact scalar completion_temperature, releases sent train-batch and
finalized rollout references, and calls trim_process_memory() after each batch
so freed glibc heap returns to the OS. Includes the fake-R3 orchestrator debug
harness (#2801) it is stacked on.

Squashed from codex/orchestrator-r3-memory.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant