Skip to content

feat(gsp): filter replay-buffer stores by per-robot force magnitude#17

Merged
jdbloom merged 1 commit intomasterfrom
feat/gsp-store-force-filter
Apr 13, 2026
Merged

feat(gsp): filter replay-buffer stores by per-robot force magnitude#17
jdbloom merged 1 commit intomasterfrom
feat/gsp-store-force-filter

Conversation

@jdbloom
Copy link
Copy Markdown
Collaborator

@jdbloom jdbloom commented Apr 13, 2026

Summary

Adds `GSP_STORE_FORCE_THRESHOLD` config knob. When set, Main.py only stores a GSP transition for robot `i` if `stats[i][0]` (force magnitude) exceeds the threshold. Default 0.0 preserves legacy behavior.

Why

Live batch diagnostic (DDQN × 6 GSP variants, 2-obstacle, seed 123) showed every variant converging to zero correlation with delta-theta by episode 100-200. Direct linear R² analysis on the captured data revealed the ceiling:

Filter Samples Linear R²
None (full distribution) 100% 7.2%
`force_magnitude > p75` 25% 24.6%
`force_magnitude > p90` 10% 27.4%
`force_magnitude > p95` 5% 30.0%

90% of timesteps have near-zero force → near-zero label → the network correctly learns "predict the mean" on the full distribution. Filtering concentrates training on the interacting-robots samples, lifting the ceiling 3-4×.

Changes

  • `run_baseline_experiments.py`: new `GSP_STORE_FORCE_THRESHOLD` config key (default 0.0).
  • `rl_code/Main.py`: adds `force_thr = config.get('GSP_STORE_FORCE_THRESHOLD', 0.0)` and `and stats[i][0] > force_thr` guard on every `store_gsp_transition` branch (plain GSP, GSP-B, GSP-N, R-GSP-N, A-GSP-N). Existing prox-activity guards preserved and ANDed with the new guard.

Test plan

  • Full RL-CT suite: 120/120 pass
  • Main.py syntax check
  • Default threshold 0.0 is identical to pre-change behavior (filter condition is `> 0.0` which means any nonzero force passes, and no prox-activity sample has zero force in practice — verified by the existing live data)

Companion

Stelaris launcher will need a parallel wiring so experiments can set `GSP_STORE_FORCE_THRESHOLD` via the matrix YAML `overrides` section.

🤖 Generated with Claude Code

Adds GSP_STORE_FORCE_THRESHOLD config knob. When set, Main.py only stores
a GSP transition for robot i if stats[i][0] (force_magnitude) exceeds the
threshold. Default 0.0 preserves legacy behavior (all transitions with
prox activity get stored).

Why: the live 6-config DDQN variant batch showed that every GSP variant
(plain GSP, GSP-B, GSP-N, R-GSP-N, A-GSP-N) converges to zero correlation
with the delta-theta label within 100-250 episodes. A direct linear-R²
diagnostic on the captured HDF5 data showed why:

  full distribution    → linear R² ceiling = 7.2%
  force_magnitude >p75 → linear R² ceiling = 24.6%
  force_magnitude >p90 → linear R² ceiling = 27.4%
  force_magnitude >p95 → linear R² ceiling = 30.0%

The 90% of timesteps with near-zero force contain near-zero signal and
drown the 10% of informative samples in the replay buffer. Even a
perfect supervised regressor can't beat var(label) when 90% of training
samples are (uninformative state, noise-dominated label) pairs.

Filtering concentrates the training distribution on the timesteps where
robots are actually interacting with obstacles and the payload is
actually rotating in response — a ~4× lift in ceiling R².

Implementation notes:
- Filter applies uniformly to all GSP variants (plain, GSP-B, GSP-N,
  R-GSP-N, A-GSP-N). The existing prox-activity guards are preserved
  and ANDed with the new force guard.
- Default threshold 0.0 means the filter is disabled unless explicitly
  enabled via config. Recommended starting value: 4.0 (≈ p75 of
  force_magnitude in 2-obstacle runs).
- stats[i][0] is per-robot force magnitude, already parsed from ZMQ
  at the top of the episode step loop — no extra I/O cost.

Companion: Stelaris launcher will need a parallel wiring so experiments
can set GSP_STORE_FORCE_THRESHOLD via the matrix YAML.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jdbloom jdbloom merged commit a705dfd into master Apr 13, 2026
3 checks passed
@jdbloom jdbloom deleted the feat/gsp-store-force-filter branch April 13, 2026 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant