Skip to content

fix: force intervention frames to Advantage: positive (pi*0.6 spec)#14

Open
jiabinq wants to merge 1 commit intoOpenDriveLab:mainfrom
jiabinq:fix/intervention-forcing
Open

fix: force intervention frames to Advantage: positive (pi*0.6 spec)#14
jiabinq wants to merge 1 commit intoOpenDriveLab:mainfrom
jiabinq:fix/intervention-forcing

Conversation

@jiabinq
Copy link

@jiabinq jiabinq commented Mar 10, 2026

Summary

  • Force human intervention frames (intervention=1) to the highest advantage bin in both staged and non-staged labeling paths, matching the pi*0.6 specification
  • Add --task-text CLI arg to replace hardcoded "fold the cloth" in tasks.jsonl
  • Add 5 unit tests for intervention forcing (binary/n_slices x staged/non-staged + backward compat)

Problem

discretize_advantage.py labels frames purely by advantage percentile. Human expert corrections in DAgger episodes get no special treatment — ~70% end up labeled "Advantage: negative" because the advantage estimator assigns low values at intervention moments (the robot was failing right before the human took over). At inference with "Advantage: positive", the model avoids reproducing those corrective actions, resulting in weak recovery behavior.

Pi*0.6 specifies that intervention frames must always be forced to positive. Evo-RL implements this correctly (force_intervention_positive=True).

Test plan

  • 5 pytest tests passing (test_discretize_advantage.py)
  • Backward compatible — no-op when intervention column is absent

Human expert intervention frames in DAgger episodes were being labeled
purely by advantage percentile, causing ~70% of expert corrections to
be associated with "Advantage: negative". This contradicts the pi*0.6
specification which requires intervention frames to always be forced
to positive.

The fix adds intervention forcing to both assign_task_index (non-staged)
and assign_task_index_staged (staged) code paths. When an "intervention"
column is present and a frame has intervention=1, it is forced to the
highest advantage bin regardless of the estimator's output.

Also adds --task-text CLI arg for configurable task descriptions in
tasks.jsonl (previously hardcoded to "fold the cloth").

Includes 5 tests covering binary/n_slices modes for both staged and
non-staged paths, plus backward compatibility when no intervention
column exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant