feat(agent): add GSP-B (full-broadcast) variant#16
Merged
Conversation
GSP-B is the full-broadcast sibling of GSP-N: each agent's GSP input is the concatenation of (prox, prev_gsp) for self first, then every other agent in ascending id order. Total input length = 2 * n_agents. Plain GSP was limited to raw per-robot proximity values with no previous-prediction feedback loop, which left the predictor signal- starved under the new direct-MSE training path (see research note docs/research/2026-04-13-gsp-information-collapse-analysis.md in Stelaris — Option A revealed that plain GSP's 4-prox-flag input converges to the trivial-mean baseline because there isn't enough signal to beat it). GSP-B provides the same kind of enriched input that GSP-N gives its neighbor-hood, but broadcast to all agents. Known limitation (shared with plain GSP): the input size is coupled to n_agents, so a trained GSP-B policy does not transfer across team sizes. GSP-N is the transferrable variant by design — that was the whole reason GSP-N exists. Changes: - Agent.__init__ gains a `broadcast: bool = False` param. Mutually exclusive with `neighbors=True` (raises ValueError). When set, gsp_input_size is overridden to 2 * n_agents. - New property `gsp_broadcast`. - New method `make_gsp_states_broadcast(agent_prox_values, agent_prev_gsp)` that builds per-agent self-first views. Maintains the gsp_observation ring buffer the same way make_gsp_states does so recurrent/attention variants can layer on top later if wanted. - `choose_agent_gsp` extended to route broadcast through the same per-agent forward-pass path as neighbors (both are per-agent self-centric predictors with the same inference shape). - Main.py's GSP prediction and storage branches add a `gsp_broadcast` dispatch that calls `make_gsp_states_broadcast` and stores per-agent transitions analogous to the GSP-N path. Tests (tests/test_agent/test_gsp_broadcast.py, 8 cases): - broadcast=True flips gsp_broadcast property True - gsp_network_input is 2 * n_agents (parameterized over n_agents) - make_gsp_states_broadcast returns one state per agent - self-first ordering is correct for each agent - others appear in ascending id order (skipping self) - neighbors=True + broadcast=True raises ValueError - plain GSP (both False) keeps legacy input size Full RL-CT suite: 120/120 pass. Companion change required in Stelaris: launcher.py CONDITION_FLAGS must add a "GSP-B" entry and build_config must pass the broadcast flag through to the Agent constructor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GSP-B is the full-broadcast sibling of GSP-N. Each agent's GSP input is `[self_prox, self_prev_gsp, other_0_prox, other_0_prev_gsp, ...]`, length `2 * n_agents`, self-first. Per-agent predictions. Same forward-pass shape as GSP-N.
Why
The Option A (direct-MSE) smoke test revealed that plain GSP is signal-starved: with only 4 prox readings and no previous-prediction feedback loop, the MSE optimum is the constant-mean baseline, so direct supervised training converges there and can't do better. See Stelaris `docs/research/2026-04-13-gsp-information-collapse-analysis.md`.
GSP-B gives plain GSP's full-broadcast conceptual model the same kind of enriched input that GSP-N gives its neighborhood — proximity AND previous predictions from every agent. Known limitation (same as plain GSP): input size is coupled to `n_agents`, so trained policies don't transfer across team sizes. That's literally why GSP-N exists, and this PR doesn't change that tradeoff.
Changes
Test plan
Companion
Stelaris launcher PR — `tools/dispatcher/launcher.py` `CONDITION_FLAGS` must add a `"GSP-B"` entry and pass the broadcast flag through `build_config` into the RL-CT `make_config`. Will open shortly.
🤖 Generated with Claude Code