Skip to content

Add seed-sweep analysis script, plus QoL for evaluation plots.#34

Merged
rbx merged 5 commits intoFairRootGroup:masterfrom
enlorenz:wip/integration
Mar 13, 2026
Merged

Add seed-sweep analysis script, plus QoL for evaluation plots.#34
rbx merged 5 commits intoFairRootGroup:masterfrom
enlorenz:wip/integration

Conversation

@enlorenz
Copy link
Contributor

No description provided.

- write evaluation-mode plots to sessions/<session>/plots-eval/
- group outputs by loaded model checkpoint and weight configuration
- reuse shared analysis naming helpers for consistent directory slugs
- keep training plots in the existing plots/ directory
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new seed-occupancy analysis tool that runs train.py per seed in evaluation mode, parses episode- and run-level metrics into a SeedRunStats dataclass, produces CSV/JSON summaries and plots; adds model-weight directory naming helper; and introduces eval-months savings utilities and evaluation-plot directory handling in train.py.

Changes

Cohort / File(s) Summary
Seed Occupancy Analysis
analyze_seed_occupancy.py
New large analysis module: orchestrates per-seed train.py executions, captures stdout/stderr, parses episode/run metrics (regex), computes SeedRunStats, numeric helpers (safe_divide, finite_mean_std, polyfit_curve, etc.), saves logs, writes summary.csv/summary.json, and generates aggregate and per-seed plots.
Analysis Naming Helper
src/analysis_naming.py
Added build_model_weight_dir_name(model, efficiency_weight, price_weight, idle_weight, job_age_weight) to produce a model-and-weight-based directory slug for evaluation plotting.
Training Script (plot dir handling)
train.py
Imports and uses build_model_weight_dir_name to select/ensure evaluation_plots_dir when saved models exist and routes evaluation plotting outputs there; enforces existence/validation during evaluation and adjusts session/plots_dir assignment.
Arrivalscale Analysis: savings refactor
analyze_arrivalscale_occupancy.py
Replaced annual-total fields with evaluation_savings, annualized_savings, evaluation_savings_off, annualized_savings_off; added eval_months parameter to make_run_stats; updated CSV/JSON headers, logging, validation, and call sites to use compute_savings_totals.
Lambda Analysis: savings refactor
analyze_lambda_occupancy.py
Same savings refactor as arrivalscale: added eval_months, replaced annual-total fields with four new savings fields, updated outputs, plots, CSV headers, and validation; uses compute_savings_totals.
Savings utilities
src/analysis_reporting.py
Added validate_eval_months(eval_months: int) -> int and compute_savings_totals(savings, eval_months) -> (evaluation_savings, annualized_savings) to validate months and compute evaluation and annualized savings.
Manifest / Other
(project-wide additions)
Large new analyzer file (~+987 lines) plus smaller edits (+~21 each) to naming, reporting, training, and two analysis scripts; public API expanded with SeedRunStats and many helper functions.

Sequence Diagram

sequenceDiagram
    participant User
    participant Analyzer as analyze_seed_occupancy.py
    participant Train as train.py (subprocess)
    participant Parser as MetricsParser
    participant Stats as StatsComputer
    participant Plotter
    participant FS as FileSystem

    User->>Analyzer: run (args: seeds, model, weights, eval_months, plots)
    Analyzer->>Analyzer: build seed schedule & train commands

    loop per seed
        Analyzer->>Train: execute train.py (seed, eval mode)
        Train->>FS: write plot files / logs
        Train-->>Analyzer: return stdout/stderr
        Analyzer->>Parser: parse episode metrics & run summaries
        Parser-->>Analyzer: arrays & run-level values
        Analyzer->>Stats: compute SeedRunStats (means, stds, deltas, fits)
        Stats-->>Analyzer: SeedRunStats
        Analyzer->>FS: optionally save raw logs for seed
    end

    Analyzer->>FS: write summary CSV/JSON
    Analyzer->>Plotter: generate aggregate multi-panel and per-seed plots
    Plotter->>FS: save plot files
    Analyzer->>User: print final summary and output locations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess whether it relates to the changeset. Add a pull request description explaining the purpose and scope of the seed-sweep analysis script and evaluation plots improvements.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: introducing a seed-sweep analysis script (analyze_seed_occupancy.py) and quality-of-life improvements for evaluation plots (evaluation_plots_dir handling in train.py and related utilities).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
analyze_seed_occupancy.py (3)

150-213: Consider using a named tuple or dataclass for the return type.

The function returns a 12-element tuple which is error-prone to unpack and maintain. Consider defining a simple dataclass or NamedTuple to improve readability and reduce risk of positional errors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 150 - 213, Replace the 12-element
positional tuple returned by parse_episode_metrics with a clear named structure:
define a small dataclass or typing.NamedTuple (e.g., EpisodeMetrics) with fields
matching the current arrays (occupancy, baseline_occupancy, agent_dropped,
savings, savings_off, completion_rate, avg_wait, agent_cost_1k,
baseline_cost_1k, baseline_off_cost_1k, agent_power, baseline_off_power),
construct and return that from parse_episode_metrics instead of the tuple, and
update any call sites that unpack the tuple to use attribute access (e.g.,
metrics.occupancy) from the new EpisodeMetrics type; keep parsing logic
(EPISODE_RE, _to_float, _to_float_or_nan) unchanged.

665-665: Consider adding strict=True to zip() for defensive coding.

The zip() calls at lines 665, 685, and 822 iterate over arrays that should have equal lengths. Adding strict=True (Python 3.10+) would catch any length mismatches early rather than silently truncating.

Example fix for line 665
-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` at line 665, Add strict=True to the zip() calls
that iterate over arrays expected to be the same length to fail fast on
mismatches: replace the loop using enumerate(zip(x, y, point_colors)) with
enumerate(zip(x, y, point_colors, strict=True)) (and similarly update the other
two occurrences noted in the review that zip over the same-sized sequences,
e.g., the zip at lines referencing the same x/y/point_colors-like variables and
the one at line 822) so mismatched lengths raise a ValueError instead of
silently truncating; keep the same enumerate(...) wrapper and variable
unpacking.

50-82: Fragile coupling: regex patterns depend on exact train.py output format.

These regex patterns are tightly coupled to the exact string format produced by build_episode_summary_line (in src/evaluation_summary.py) and the print statements in train.py. Any future changes to those output formats will silently break parsing here.

Consider adding a comment near these patterns documenting the dependency, or alternatively, consider having train.py output structured data (e.g., JSON lines) in evaluation mode that would be more robust to parse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 50 - 82, The regexes (EPISODE_RE,
WAIT_SUMMARY_RE, ARRIVALS_SUMMARY_RE, DROPPED_AGENT_SUMMARY_RE,
DROPPED_BASELINE_SUMMARY_RE) are tightly coupled to the exact text output of
build_episode_summary_line and train.py and will break silently if that format
changes; either (A) add a clear comment above these patterns stating the
dependency on build_episode_summary_line and train.py's print format and list
the exact fields expected, or (B) make the output from
train.py/evaluation_summary.py structured (e.g., JSON lines from
build_episode_summary_line) and replace the fragile parsing by reading/parsing
that JSON (update the code that currently applies EPISODE_RE/WAIT_SUMMARY_RE to
use json.loads and access the same keys), ensuring backward compatibility
(accept old text if JSON not present) during rollout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_seed_occupancy.py`:
- Line 294: The current code creates baseline_off_occupancy by copying
baseline_occupancy (baseline_off_occupancy = baseline_occupancy), producing
duplicate metrics; fix by choosing one of two paths: (A) add a distinct
"Baseline Off Occupancy" value to the episode summary in train.py and update
EPISODE_RE in evaluation_summary.py to capture it so
baseline_off_occupancy_mean/std/samples come from real output, or (B) remove the
redundant baseline_off_occupancy data members and any references to
baseline_off_occupancy_mean, baseline_off_occupancy_std, and
baseline_off_occupancy_samples from the dataclass and analysis code; apply the
same change consistently in analyze_seed_occupancy.py,
analyze_lambda_occupancy.py, and analyze_arrivalscale_occupancy.py so metrics
and parsing stay aligned.

---

Nitpick comments:
In `@analyze_seed_occupancy.py`:
- Around line 150-213: Replace the 12-element positional tuple returned by
parse_episode_metrics with a clear named structure: define a small dataclass or
typing.NamedTuple (e.g., EpisodeMetrics) with fields matching the current arrays
(occupancy, baseline_occupancy, agent_dropped, savings, savings_off,
completion_rate, avg_wait, agent_cost_1k, baseline_cost_1k,
baseline_off_cost_1k, agent_power, baseline_off_power), construct and return
that from parse_episode_metrics instead of the tuple, and update any call sites
that unpack the tuple to use attribute access (e.g., metrics.occupancy) from the
new EpisodeMetrics type; keep parsing logic (EPISODE_RE, _to_float,
_to_float_or_nan) unchanged.
- Line 665: Add strict=True to the zip() calls that iterate over arrays expected
to be the same length to fail fast on mismatches: replace the loop using
enumerate(zip(x, y, point_colors)) with enumerate(zip(x, y, point_colors,
strict=True)) (and similarly update the other two occurrences noted in the
review that zip over the same-sized sequences, e.g., the zip at lines
referencing the same x/y/point_colors-like variables and the one at line 822) so
mismatched lengths raise a ValueError instead of silently truncating; keep the
same enumerate(...) wrapper and variable unpacking.
- Around line 50-82: The regexes (EPISODE_RE, WAIT_SUMMARY_RE,
ARRIVALS_SUMMARY_RE, DROPPED_AGENT_SUMMARY_RE, DROPPED_BASELINE_SUMMARY_RE) are
tightly coupled to the exact text output of build_episode_summary_line and
train.py and will break silently if that format changes; either (A) add a clear
comment above these patterns stating the dependency on
build_episode_summary_line and train.py's print format and list the exact fields
expected, or (B) make the output from train.py/evaluation_summary.py structured
(e.g., JSON lines from build_episode_summary_line) and replace the fragile
parsing by reading/parsing that JSON (update the code that currently applies
EPISODE_RE/WAIT_SUMMARY_RE to use json.loads and access the same keys), ensuring
backward compatibility (accept old text if JSON not present) during rollout.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6de70e54-3a0f-4fb2-b24d-8795f6d10a1e

📥 Commits

Reviewing files that changed from the base of the PR and between c580593 and 79adf4b.

📒 Files selected for processing (3)
  • analyze_seed_occupancy.py
  • src/analysis_naming.py
  • train.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
analyze_seed_occupancy.py (2)

294-295: ⚠️ Potential issue | 🟠 Major

baseline_off_occupancy is currently synthetic, not measured.

baseline_off_occupancy is copied from baseline occupancy, so all baseline_off_occupancy_* metrics are duplicates and misleading. In current output formatting, there is no separate baseline-off occupancy value emitted to parse.

Suggested minimal fix
-    baseline_off_occupancy = baseline_occupancy.copy()
+    # train.py/evaluation summary does not currently emit baseline_off occupancy.
+    # Keep this explicitly unavailable instead of duplicating baseline occupancy.
+    baseline_off_occupancy = np.full_like(baseline_occupancy, np.nan, dtype=float)

Also applies to: 303-305, 335-335

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 294 - 295, The variable
baseline_off_occupancy is currently set as baseline_occupancy.copy(), producing
duplicate/misleading baseline_off_occupancy_* metrics; replace this synthetic
copy with a real computed off-occupancy (e.g., baseline_off_occupancy =
baseline_occupancy - baseline_on_occupancy or the equivalent measured
subtraction you already compute elsewhere) or stop emitting
baseline_off_occupancy_* fields. Update every place that emits or uses
baseline_off_occupancy (and related baseline_off_occupancy_* metrics) to use the
computed value (or remove those emissions) so the reported baseline-off
occupancy is accurate and not a duplicate of baseline_occupancy.

329-330: ⚠️ Potential issue | 🟠 Major

annual_total_savings* label is inaccurate when --eval-months != 12.

These values are raw totals over the evaluated window, not annualized totals. This produces mislabeled outputs in CSV/JSON for 1/3/6-month sweeps.

Also applies to: 540-542

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 329 - 330, The fields
annual_total_savings and annual_total_savings_off (and the similar ones around
the other occurrence) are currently raw sums over the evaluation window; update
the code that sets annual_total_savings and annual_total_savings_off to
annualize those totals when the evaluation window isn't 12 months by multiplying
the raw total (np.sum(savings) / np.sum(savings_off)) by (12.0 / eval_months)
(or args.eval_months if that is the variable name), or alternatively rename the
keys to total_savings/total_savings_off to reflect they are not annualized—apply
the same fix to the other occurrence (the variables around lines 540-542) and
ensure the CSV/JSON output uses the corrected values/names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 892-895: The validation currently unconditionally checks
seed-range flags and calls parser.error for invalid ranges; change it to only
perform the seed_step and min/max checks when explicit seeds were not provided
(i.e., when args.seeds is empty/None). Update the block that calls parser.error
for seed_step, max_seed and min_seed (the lines referencing args.seed_step,
args.max_seed, args.min_seed and parser.error) to be guarded by a condition like
"if not args.seeds:" so that explicit --seeds bypasses the range validation.
- Around line 468-470: The block that sets dropped_jobs_baseline_total currently
defaults it to 0.0 when None, which can create false deltas; change the fallback
to NaN (e.g., math.nan or float('nan')) instead of 0.0, ensure the math module
is imported if using math.nan, and keep the warning print (seed variable) but
update if desired; this preserves unknown baseline semantics for downstream
stats that consume dropped_jobs_baseline_total.
- Around line 452-456: The current fallback sets baseline_avg_wait_hours = 0.0
when baseline_wait_summary is missing, which fabricates data; change the
fallback to leave baseline as missing (e.g., set baseline_avg_wait_hours =
float("nan") or None) and similarly ensure agent_avg_wait_hours uses np.nan if
avg_wait is empty; update downstream computations that compute wait_delta_hours
and effective savings (where they assume numeric) to handle NaN/None (propagate
NaN or skip aggregation) so missing baseline data does not distort results; look
for the symbols agent_wait_summary, baseline_wait_summary, agent_avg_wait_hours,
baseline_avg_wait_hours, avg_wait, wait_delta_hours and effective-savings to
make the adjustments.

---

Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 294-295: The variable baseline_off_occupancy is currently set as
baseline_occupancy.copy(), producing duplicate/misleading
baseline_off_occupancy_* metrics; replace this synthetic copy with a real
computed off-occupancy (e.g., baseline_off_occupancy = baseline_occupancy -
baseline_on_occupancy or the equivalent measured subtraction you already compute
elsewhere) or stop emitting baseline_off_occupancy_* fields. Update every place
that emits or uses baseline_off_occupancy (and related baseline_off_occupancy_*
metrics) to use the computed value (or remove those emissions) so the reported
baseline-off occupancy is accurate and not a duplicate of baseline_occupancy.
- Around line 329-330: The fields annual_total_savings and
annual_total_savings_off (and the similar ones around the other occurrence) are
currently raw sums over the evaluation window; update the code that sets
annual_total_savings and annual_total_savings_off to annualize those totals when
the evaluation window isn't 12 months by multiplying the raw total
(np.sum(savings) / np.sum(savings_off)) by (12.0 / eval_months) (or
args.eval_months if that is the variable name), or alternatively rename the keys
to total_savings/total_savings_off to reflect they are not annualized—apply the
same fix to the other occurrence (the variables around lines 540-542) and ensure
the CSV/JSON output uses the corrected values/names.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5a207639-522c-4b83-a77a-7d459c411d01

📥 Commits

Reviewing files that changed from the base of the PR and between 79adf4b and c10fda2.

📒 Files selected for processing (1)
  • analyze_seed_occupancy.py

… mode for a seed-based sweep with fixed --job-arrival-scale 1.0.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
analyze_seed_occupancy.py (2)

298-310: ⚠️ Potential issue | 🟠 Major

baseline_off_occupancy is still duplicated from baseline occupancy.

Line 298 copies baseline occupancy, so baseline_off_occupancy_* metrics are not independent and can mislead plots/CSV interpretation.

Also applies to: 343-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 298 - 310, The code incorrectly sets
baseline_off_occupancy = baseline_occupancy.copy(), duplicating the baseline
metrics; replace that assignment so baseline_off_occupancy is derived from the
correct "off" array (e.g., occupancy_off or baseline_occupancy_off — whichever
variable in the diff holds the off-run occupancy) and use .copy() if needed,
then recompute the SeedRunStats fields (baseline_off_occupancy_mean/std) from
that corrected array; apply the same fix for the other duplicate at the later
occurrence (around the second instance noted).

477-478: ⚠️ Potential issue | 🟡 Minor

Align warning text with actual fallback behavior.

Line 477 sets NaN, but Line 478 says “defaulting to 0.” This is misleading during debugging.

Proposed fix
-        print(f"[warn] seed={seed}: could not parse run-level baseline dropped total; defaulting to 0.")
+        print(f"[warn] seed={seed}: could not parse run-level baseline dropped total; setting NaN.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 477 - 478, The warning message is
inconsistent with the fallback value: dropped_jobs_baseline_total is set to
float("nan") but the print says "defaulting to 0."; update the warning text to
reflect the actual fallback (e.g., "defaulting to NaN" or "set to NaN") so the
log matches the variable dropped_jobs_baseline_total and aids debugging — adjust
the print call that follows the assignment to reference the correct fallback
value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 22-24: Update the FAST DEBUG command example at the top of the
file to call the correct script name: replace the reference to
analyze_hourlyjobs_seed_occupancy.py with analyze_seed_occupancy.py in the
command string (the multiline shell invocation shown in the file header) so the
example matches this file's actual script name and works when copy-pasted.

---

Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 298-310: The code incorrectly sets baseline_off_occupancy =
baseline_occupancy.copy(), duplicating the baseline metrics; replace that
assignment so baseline_off_occupancy is derived from the correct "off" array
(e.g., occupancy_off or baseline_occupancy_off — whichever variable in the diff
holds the off-run occupancy) and use .copy() if needed, then recompute the
SeedRunStats fields (baseline_off_occupancy_mean/std) from that corrected array;
apply the same fix for the other duplicate at the later occurrence (around the
second instance noted).
- Around line 477-478: The warning message is inconsistent with the fallback
value: dropped_jobs_baseline_total is set to float("nan") but the print says
"defaulting to 0."; update the warning text to reflect the actual fallback
(e.g., "defaulting to NaN" or "set to NaN") so the log matches the variable
dropped_jobs_baseline_total and aids debugging — adjust the print call that
follows the assignment to reference the correct fallback value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 752f1296-afda-47e6-b649-68e818af87e8

📥 Commits

Reviewing files that changed from the base of the PR and between c10fda2 and c27dab8.

📒 Files selected for processing (4)
  • analyze_arrivalscale_occupancy.py
  • analyze_lambda_occupancy.py
  • analyze_seed_occupancy.py
  • src/analysis_reporting.py

rbx and others added 2 commits March 13, 2026 14:27
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
analyze_seed_occupancy.py (2)

680-695: Consider adding strict=True to zip() for defensive coding.

While the arrays are derived from the same source and should always have matching lengths, adding strict=True would catch any future bugs where array lengths diverge.

Proposed fix
-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 680 - 695, The loop over coordinates
uses zip(x, y, point_colors) and should be made defensive by adding strict=True
so mismatched lengths raise immediately; update the iteration to zip(x, y,
point_colors, strict=True) in the block that calls ax.errorbar (referencing
variables x, y, point_colors and helper _error_at) to catch any future
divergence in array lengths.

169-216: Unused avg_wait array returned but never consumed.

The avg_wait array (per-episode wait times) is extracted and returned but the caller at line 452 discards it, using the run-level wait summary instead. Consider either removing it from the return tuple or utilizing it in downstream analysis for episode-level wait statistics.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 169 - 216, The avg_wait per-episode
array is collected but unused downstream; remove it to avoid dead data: delete
the avg_wait=list initialization, remove the line that appends
_to_float(match.group("avg_wait")), and remove the corresponding
np.asarray(agent_wait/avg_wait) entry from the returned tuple; then update any
callers/unpacking that expected that extra return value (the caller currently
discards it so adjust unpacking to match the new return shape). Use the avg_wait
symbol and the EPISODE_RE extraction line to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 476-478: The printed warning in analyze_seed_occupancy.py is
inconsistent: dropped_jobs_baseline_total is set to float("nan") but the message
says "defaulting to 0"; update the print statement near the
dropped_jobs_baseline_total assignment to accurately state that it's defaulting
to NaN (or otherwise reflect the actual behavior), e.g., change the message to
mention "defaulting to NaN" or the chosen fallback, so the log matches the
variable value.
- Around line 460-463: The agent branch sets agent_avg_wait_hours = 0.0 while
baseline_avg_wait_hours is set to NaN, causing inconsistent and misleading
wait_delta_hours; update the handling in the block that checks
agent_wait_summary and baseline_wait_summary so that agent_avg_wait_hours
mirrors baseline behavior (use NaN) unless episode-level data exists — if
avg_wait episode data is present compute agent_avg_wait_hours =
float(np.mean(avg_wait)) (or equivalent) otherwise set agent_avg_wait_hours =
float("nan") to match baseline_avg_wait_hours; adjust references to
agent_wait_summary, baseline_wait_summary, agent_avg_wait_hours,
baseline_avg_wait_hours, avg_wait and wait_delta_hours accordingly.

---

Nitpick comments:
In `@analyze_seed_occupancy.py`:
- Around line 680-695: The loop over coordinates uses zip(x, y, point_colors)
and should be made defensive by adding strict=True so mismatched lengths raise
immediately; update the iteration to zip(x, y, point_colors, strict=True) in the
block that calls ax.errorbar (referencing variables x, y, point_colors and
helper _error_at) to catch any future divergence in array lengths.
- Around line 169-216: The avg_wait per-episode array is collected but unused
downstream; remove it to avoid dead data: delete the avg_wait=list
initialization, remove the line that appends _to_float(match.group("avg_wait")),
and remove the corresponding np.asarray(agent_wait/avg_wait) entry from the
returned tuple; then update any callers/unpacking that expected that extra
return value (the caller currently discards it so adjust unpacking to match the
new return shape). Use the avg_wait symbol and the EPISODE_RE extraction line to
locate the code to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b1d6d602-b59c-408c-8a12-bc145e109766

📥 Commits

Reviewing files that changed from the base of the PR and between c27dab8 and 7e8a9ad.

📒 Files selected for processing (1)
  • analyze_seed_occupancy.py

Use episode-level avg_wait mean as agent fallback when run-level wait
summary cannot be parsed, instead of fabricating 0.0. Fix warning
message for missing baseline dropped total to say "NaN" not "0".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
analyze_seed_occupancy.py (1)

298-310: ⚠️ Potential issue | 🟠 Major

baseline_off_occupancy is currently fabricated from baseline occupancy.

baseline_off_occupancy = baseline_occupancy.copy() creates duplicate metrics/plots that look independent but are not. Since episode output currently exposes only one baseline node-occupancy value, this should be marked missing (NaN) or removed until a true baseline_off occupancy signal exists.

Proposed minimal fix (avoid misleading duplicate metric)
-    baseline_off_occupancy = baseline_occupancy.copy()
+    # No distinct baseline_off occupancy is emitted in episode summaries yet.
+    baseline_off_occupancy = np.full_like(baseline_occupancy, np.nan, dtype=float)
+    baseline_off_occupancy_mean, baseline_off_occupancy_std = finite_mean_std(baseline_off_occupancy)
@@
-        baseline_off_occupancy_mean=float(np.mean(baseline_off_occupancy)),
-        baseline_off_occupancy_std=float(np.std(baseline_off_occupancy)),
+        baseline_off_occupancy_mean=baseline_off_occupancy_mean,
+        baseline_off_occupancy_std=baseline_off_occupancy_std,

Also applies to: 343-343, 525-527

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 298 - 310, The code currently
fabricates baseline_off_occupancy via baseline_off_occupancy =
baseline_occupancy.copy(), which produces misleading duplicate metrics; update
the logic in the function that constructs and returns SeedRunStats so that
baseline_off_occupancy is not copied from baseline_occupancy but instead set to
a missing indicator (e.g., NaN) or omitted from SeedRunStats until a real
baseline_off signal exists—modify the creation of baseline_off_occupancy and the
corresponding fields baseline_off_occupancy_mean and baseline_off_occupancy_std
in the returned SeedRunStats to reflect the missing value (NaN) or remove those
fields consistently (also apply the same change at the other occurrences
mentioned around the function: the lines near the other occurrences referenced
in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 17-19: The module docstring in analyze_seed_occupancy.py
incorrectly states a fixed 12-month evaluation; update it to reflect the
configurable duration controlled by the --eval-months flag (or eval_months
variable) so it no longer claims evaluation always runs for 12 months — describe
that the script runs train.py in evaluation mode for a configurable number of
months (default 12) resulting in 2*months episodes, parses per-episode metrics,
computes mean/std, and optionally fits polynomial trend lines.

---

Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 298-310: The code currently fabricates baseline_off_occupancy via
baseline_off_occupancy = baseline_occupancy.copy(), which produces misleading
duplicate metrics; update the logic in the function that constructs and returns
SeedRunStats so that baseline_off_occupancy is not copied from
baseline_occupancy but instead set to a missing indicator (e.g., NaN) or omitted
from SeedRunStats until a real baseline_off signal exists—modify the creation of
baseline_off_occupancy and the corresponding fields baseline_off_occupancy_mean
and baseline_off_occupancy_std in the returned SeedRunStats to reflect the
missing value (NaN) or remove those fields consistently (also apply the same
change at the other occurrences mentioned around the function: the lines near
the other occurrences referenced in the comment).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0733468-09e5-46f5-b770-8fee43b94ae2

📥 Commits

Reviewing files that changed from the base of the PR and between 7e8a9ad and 8cc15f5.

📒 Files selected for processing (1)
  • analyze_seed_occupancy.py

Comment on lines +17 to +19
For each seed, this script runs train.py in evaluation mode for one year
(12 months = 24 episodes), parses per-episode metrics from stdout, computes
mean/std, and fits optional polynomial trend lines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring overstates evaluation duration as always 12 months.

The script supports --eval-months, so this text is inaccurate and can mislead users.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_seed_occupancy.py` around lines 17 - 19, The module docstring in
analyze_seed_occupancy.py incorrectly states a fixed 12-month evaluation; update
it to reflect the configurable duration controlled by the --eval-months flag (or
eval_months variable) so it no longer claims evaluation always runs for 12
months — describe that the script runs train.py in evaluation mode for a
configurable number of months (default 12) resulting in 2*months episodes,
parses per-episode metrics, computes mean/std, and optionally fits polynomial
trend lines.

@rbx rbx merged commit abcc2bf into FairRootGroup:master Mar 13, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants