Skip to content

[codex] align hosted training config examples#15

Draft
tim0120 wants to merge 1 commit into
mainfrom
codex/hosted-distillation-configs
Draft

[codex] align hosted training config examples#15
tim0120 wants to merge 1 commit into
mainfrom
codex/hosted-distillation-configs

Conversation

@tim0120

@tim0120 tim0120 commented Jun 1, 2026

Copy link
Copy Markdown

Summary

  • update SFT and OPD cookbook configs to match the hosted CLI distillation surface
  • add loss = "opd" and remove unsupported public teacher knobs like teacher_tau, save, and replay
  • route training environment overrides through env.args so examples validate with the hosted train config parser
  • align the related RL continuation examples/docs with the same Hosted Training env schema

Why

The new OPD setup uses the public hosted CLI shape (loss = "opd" plus [teacher]) while PrimeRL/runtime handles teacher endpoint wiring internally. The cookbook should document that hosted surface instead of PrimeRL internal config fields or stale env override tables.

Hosted Training uses one env config schema across rl, sft, and opd: per-env taskset/harness overrides are passed through env.args, not top-level [env.taskset] or [env.harness] tables. Keeping the related RL examples on the old shape would leave cookbook training examples that the new parser rejects.

Validation

  • python3 + tomllib: parsed all 38 cookbook TOML files
  • uv run python: loaded changed hosted training TOMLs with prime_cli.commands.rl.load_config from the local OPD CLI branch
  • git diff --check

Related

@tim0120 tim0120 changed the title [codex] update hosted distillation configs [codex] align hosted training config examples Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant