Skip to content

feat(agent): add GSP-B (full-broadcast) variant#16

Merged
jdbloom merged 1 commit intomasterfrom
feat/gsp-b-broadcast
Apr 13, 2026
Merged

feat(agent): add GSP-B (full-broadcast) variant#16
jdbloom merged 1 commit intomasterfrom
feat/gsp-b-broadcast

Conversation

@jdbloom
Copy link
Copy Markdown
Collaborator

@jdbloom jdbloom commented Apr 13, 2026

Summary

GSP-B is the full-broadcast sibling of GSP-N. Each agent's GSP input is `[self_prox, self_prev_gsp, other_0_prox, other_0_prev_gsp, ...]`, length `2 * n_agents`, self-first. Per-agent predictions. Same forward-pass shape as GSP-N.

Why

The Option A (direct-MSE) smoke test revealed that plain GSP is signal-starved: with only 4 prox readings and no previous-prediction feedback loop, the MSE optimum is the constant-mean baseline, so direct supervised training converges there and can't do better. See Stelaris `docs/research/2026-04-13-gsp-information-collapse-analysis.md`.

GSP-B gives plain GSP's full-broadcast conceptual model the same kind of enriched input that GSP-N gives its neighborhood — proximity AND previous predictions from every agent. Known limitation (same as plain GSP): input size is coupled to `n_agents`, so trained policies don't transfer across team sizes. That's literally why GSP-N exists, and this PR doesn't change that tradeoff.

Changes

  • `Agent.init`: new `broadcast: bool = False` param, mutually exclusive with `neighbors=True`. When set, `gsp_input_size` is overridden to `2 * n_agents`. Per-agent `gsp_observation` ring buffer allocation unified with the `neighbors` path.
  • `Agent.make_gsp_states_broadcast(prox, prev_gsp)`: new state builder, self-first ordering.
  • `Agent.choose_agent_gsp`: extended to route broadcast through the same per-agent forward-pass path as neighbors.
  • `Main.py`: GSP prediction and storage branches add `elif model.gsp_broadcast:` dispatches, parallel to the existing `gsp_neighbors` path. Reads `BROADCAST` from the config dict.

Test plan

  • 8 new tests in `tests/test_agent/test_gsp_broadcast.py`:
    • property wiring (`gsp_broadcast`)
    • `gsp_network_input == 2 * n_agents` for n=4 and n=8
    • state builder returns one state per agent
    • self-first ordering
    • others in ascending id order (skipping self)
    • neighbors + broadcast raises ValueError
    • plain GSP (neither) keeps legacy input
  • Full RL-CT suite: 120/120 pass (was 112, +8)

Companion

Stelaris launcher PR — `tools/dispatcher/launcher.py` `CONDITION_FLAGS` must add a `"GSP-B"` entry and pass the broadcast flag through `build_config` into the RL-CT `make_config`. Will open shortly.

🤖 Generated with Claude Code

GSP-B is the full-broadcast sibling of GSP-N: each agent's GSP input is
the concatenation of (prox, prev_gsp) for self first, then every other
agent in ascending id order. Total input length = 2 * n_agents.

Plain GSP was limited to raw per-robot proximity values with no
previous-prediction feedback loop, which left the predictor signal-
starved under the new direct-MSE training path (see research note
docs/research/2026-04-13-gsp-information-collapse-analysis.md in
Stelaris — Option A revealed that plain GSP's 4-prox-flag input
converges to the trivial-mean baseline because there isn't enough
signal to beat it). GSP-B provides the same kind of enriched input
that GSP-N gives its neighbor-hood, but broadcast to all agents.

Known limitation (shared with plain GSP): the input size is coupled to
n_agents, so a trained GSP-B policy does not transfer across team
sizes. GSP-N is the transferrable variant by design — that was the
whole reason GSP-N exists.

Changes:
- Agent.__init__ gains a `broadcast: bool = False` param. Mutually
  exclusive with `neighbors=True` (raises ValueError). When set,
  gsp_input_size is overridden to 2 * n_agents.
- New property `gsp_broadcast`.
- New method `make_gsp_states_broadcast(agent_prox_values, agent_prev_gsp)`
  that builds per-agent self-first views. Maintains the gsp_observation
  ring buffer the same way make_gsp_states does so recurrent/attention
  variants can layer on top later if wanted.
- `choose_agent_gsp` extended to route broadcast through the same
  per-agent forward-pass path as neighbors (both are per-agent
  self-centric predictors with the same inference shape).
- Main.py's GSP prediction and storage branches add a `gsp_broadcast`
  dispatch that calls `make_gsp_states_broadcast` and stores per-agent
  transitions analogous to the GSP-N path.

Tests (tests/test_agent/test_gsp_broadcast.py, 8 cases):
- broadcast=True flips gsp_broadcast property True
- gsp_network_input is 2 * n_agents (parameterized over n_agents)
- make_gsp_states_broadcast returns one state per agent
- self-first ordering is correct for each agent
- others appear in ascending id order (skipping self)
- neighbors=True + broadcast=True raises ValueError
- plain GSP (both False) keeps legacy input size

Full RL-CT suite: 120/120 pass.

Companion change required in Stelaris: launcher.py CONDITION_FLAGS must
add a "GSP-B" entry and build_config must pass the broadcast flag
through to the Agent constructor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jdbloom jdbloom merged commit d452013 into master Apr 13, 2026
3 checks passed
@jdbloom jdbloom deleted the feat/gsp-b-broadcast branch April 13, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant