feat(gsp): filter replay-buffer stores by per-robot force magnitude#17
Merged
feat(gsp): filter replay-buffer stores by per-robot force magnitude#17
Conversation
Adds GSP_STORE_FORCE_THRESHOLD config knob. When set, Main.py only stores a GSP transition for robot i if stats[i][0] (force_magnitude) exceeds the threshold. Default 0.0 preserves legacy behavior (all transitions with prox activity get stored). Why: the live 6-config DDQN variant batch showed that every GSP variant (plain GSP, GSP-B, GSP-N, R-GSP-N, A-GSP-N) converges to zero correlation with the delta-theta label within 100-250 episodes. A direct linear-R² diagnostic on the captured HDF5 data showed why: full distribution → linear R² ceiling = 7.2% force_magnitude >p75 → linear R² ceiling = 24.6% force_magnitude >p90 → linear R² ceiling = 27.4% force_magnitude >p95 → linear R² ceiling = 30.0% The 90% of timesteps with near-zero force contain near-zero signal and drown the 10% of informative samples in the replay buffer. Even a perfect supervised regressor can't beat var(label) when 90% of training samples are (uninformative state, noise-dominated label) pairs. Filtering concentrates the training distribution on the timesteps where robots are actually interacting with obstacles and the payload is actually rotating in response — a ~4× lift in ceiling R². Implementation notes: - Filter applies uniformly to all GSP variants (plain, GSP-B, GSP-N, R-GSP-N, A-GSP-N). The existing prox-activity guards are preserved and ANDed with the new force guard. - Default threshold 0.0 means the filter is disabled unless explicitly enabled via config. Recommended starting value: 4.0 (≈ p75 of force_magnitude in 2-obstacle runs). - stats[i][0] is per-robot force magnitude, already parsed from ZMQ at the top of the episode step loop — no extra I/O cost. Companion: Stelaris launcher will need a parallel wiring so experiments can set GSP_STORE_FORCE_THRESHOLD via the matrix YAML. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds `GSP_STORE_FORCE_THRESHOLD` config knob. When set, Main.py only stores a GSP transition for robot `i` if `stats[i][0]` (force magnitude) exceeds the threshold. Default 0.0 preserves legacy behavior.
Why
Live batch diagnostic (DDQN × 6 GSP variants, 2-obstacle, seed 123) showed every variant converging to zero correlation with delta-theta by episode 100-200. Direct linear R² analysis on the captured data revealed the ceiling:
90% of timesteps have near-zero force → near-zero label → the network correctly learns "predict the mean" on the full distribution. Filtering concentrates training on the interacting-robots samples, lifting the ceiling 3-4×.
Changes
Test plan
Companion
Stelaris launcher will need a parallel wiring so experiments can set `GSP_STORE_FORCE_THRESHOLD` via the matrix YAML `overrides` section.
🤖 Generated with Claude Code