Add seed-sweep analysis script, plus QoL for evaluation plots.#34
Add seed-sweep analysis script, plus QoL for evaluation plots.#34rbx merged 5 commits intoFairRootGroup:masterfrom
Conversation
- write evaluation-mode plots to sessions/<session>/plots-eval/ - group outputs by loaded model checkpoint and weight configuration - reuse shared analysis naming helpers for consistent directory slugs - keep training plots in the existing plots/ directory
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new seed-occupancy analysis tool that runs Changes
Sequence DiagramsequenceDiagram
participant User
participant Analyzer as analyze_seed_occupancy.py
participant Train as train.py (subprocess)
participant Parser as MetricsParser
participant Stats as StatsComputer
participant Plotter
participant FS as FileSystem
User->>Analyzer: run (args: seeds, model, weights, eval_months, plots)
Analyzer->>Analyzer: build seed schedule & train commands
loop per seed
Analyzer->>Train: execute train.py (seed, eval mode)
Train->>FS: write plot files / logs
Train-->>Analyzer: return stdout/stderr
Analyzer->>Parser: parse episode metrics & run summaries
Parser-->>Analyzer: arrays & run-level values
Analyzer->>Stats: compute SeedRunStats (means, stds, deltas, fits)
Stats-->>Analyzer: SeedRunStats
Analyzer->>FS: optionally save raw logs for seed
end
Analyzer->>FS: write summary CSV/JSON
Analyzer->>Plotter: generate aggregate multi-panel and per-seed plots
Plotter->>FS: save plot files
Analyzer->>User: print final summary and output locations
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
analyze_seed_occupancy.py (3)
150-213: Consider using a named tuple or dataclass for the return type.The function returns a 12-element tuple which is error-prone to unpack and maintain. Consider defining a simple dataclass or
NamedTupleto improve readability and reduce risk of positional errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 150 - 213, Replace the 12-element positional tuple returned by parse_episode_metrics with a clear named structure: define a small dataclass or typing.NamedTuple (e.g., EpisodeMetrics) with fields matching the current arrays (occupancy, baseline_occupancy, agent_dropped, savings, savings_off, completion_rate, avg_wait, agent_cost_1k, baseline_cost_1k, baseline_off_cost_1k, agent_power, baseline_off_power), construct and return that from parse_episode_metrics instead of the tuple, and update any call sites that unpack the tuple to use attribute access (e.g., metrics.occupancy) from the new EpisodeMetrics type; keep parsing logic (EPISODE_RE, _to_float, _to_float_or_nan) unchanged.
665-665: Consider addingstrict=Truetozip()for defensive coding.The
zip()calls at lines 665, 685, and 822 iterate over arrays that should have equal lengths. Addingstrict=True(Python 3.10+) would catch any length mismatches early rather than silently truncating.Example fix for line 665
- for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)): + for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` at line 665, Add strict=True to the zip() calls that iterate over arrays expected to be the same length to fail fast on mismatches: replace the loop using enumerate(zip(x, y, point_colors)) with enumerate(zip(x, y, point_colors, strict=True)) (and similarly update the other two occurrences noted in the review that zip over the same-sized sequences, e.g., the zip at lines referencing the same x/y/point_colors-like variables and the one at line 822) so mismatched lengths raise a ValueError instead of silently truncating; keep the same enumerate(...) wrapper and variable unpacking.
50-82: Fragile coupling: regex patterns depend on exact train.py output format.These regex patterns are tightly coupled to the exact string format produced by
build_episode_summary_line(insrc/evaluation_summary.py) and the print statements intrain.py. Any future changes to those output formats will silently break parsing here.Consider adding a comment near these patterns documenting the dependency, or alternatively, consider having
train.pyoutput structured data (e.g., JSON lines) in evaluation mode that would be more robust to parse.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 50 - 82, The regexes (EPISODE_RE, WAIT_SUMMARY_RE, ARRIVALS_SUMMARY_RE, DROPPED_AGENT_SUMMARY_RE, DROPPED_BASELINE_SUMMARY_RE) are tightly coupled to the exact text output of build_episode_summary_line and train.py and will break silently if that format changes; either (A) add a clear comment above these patterns stating the dependency on build_episode_summary_line and train.py's print format and list the exact fields expected, or (B) make the output from train.py/evaluation_summary.py structured (e.g., JSON lines from build_episode_summary_line) and replace the fragile parsing by reading/parsing that JSON (update the code that currently applies EPISODE_RE/WAIT_SUMMARY_RE to use json.loads and access the same keys), ensuring backward compatibility (accept old text if JSON not present) during rollout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_seed_occupancy.py`:
- Line 294: The current code creates baseline_off_occupancy by copying
baseline_occupancy (baseline_off_occupancy = baseline_occupancy), producing
duplicate metrics; fix by choosing one of two paths: (A) add a distinct
"Baseline Off Occupancy" value to the episode summary in train.py and update
EPISODE_RE in evaluation_summary.py to capture it so
baseline_off_occupancy_mean/std/samples come from real output, or (B) remove the
redundant baseline_off_occupancy data members and any references to
baseline_off_occupancy_mean, baseline_off_occupancy_std, and
baseline_off_occupancy_samples from the dataclass and analysis code; apply the
same change consistently in analyze_seed_occupancy.py,
analyze_lambda_occupancy.py, and analyze_arrivalscale_occupancy.py so metrics
and parsing stay aligned.
---
Nitpick comments:
In `@analyze_seed_occupancy.py`:
- Around line 150-213: Replace the 12-element positional tuple returned by
parse_episode_metrics with a clear named structure: define a small dataclass or
typing.NamedTuple (e.g., EpisodeMetrics) with fields matching the current arrays
(occupancy, baseline_occupancy, agent_dropped, savings, savings_off,
completion_rate, avg_wait, agent_cost_1k, baseline_cost_1k,
baseline_off_cost_1k, agent_power, baseline_off_power), construct and return
that from parse_episode_metrics instead of the tuple, and update any call sites
that unpack the tuple to use attribute access (e.g., metrics.occupancy) from the
new EpisodeMetrics type; keep parsing logic (EPISODE_RE, _to_float,
_to_float_or_nan) unchanged.
- Line 665: Add strict=True to the zip() calls that iterate over arrays expected
to be the same length to fail fast on mismatches: replace the loop using
enumerate(zip(x, y, point_colors)) with enumerate(zip(x, y, point_colors,
strict=True)) (and similarly update the other two occurrences noted in the
review that zip over the same-sized sequences, e.g., the zip at lines
referencing the same x/y/point_colors-like variables and the one at line 822) so
mismatched lengths raise a ValueError instead of silently truncating; keep the
same enumerate(...) wrapper and variable unpacking.
- Around line 50-82: The regexes (EPISODE_RE, WAIT_SUMMARY_RE,
ARRIVALS_SUMMARY_RE, DROPPED_AGENT_SUMMARY_RE, DROPPED_BASELINE_SUMMARY_RE) are
tightly coupled to the exact text output of build_episode_summary_line and
train.py and will break silently if that format changes; either (A) add a clear
comment above these patterns stating the dependency on
build_episode_summary_line and train.py's print format and list the exact fields
expected, or (B) make the output from train.py/evaluation_summary.py structured
(e.g., JSON lines from build_episode_summary_line) and replace the fragile
parsing by reading/parsing that JSON (update the code that currently applies
EPISODE_RE/WAIT_SUMMARY_RE to use json.loads and access the same keys), ensuring
backward compatibility (accept old text if JSON not present) during rollout.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6de70e54-3a0f-4fb2-b24d-8795f6d10a1e
📒 Files selected for processing (3)
analyze_seed_occupancy.pysrc/analysis_naming.pytrain.py
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
analyze_seed_occupancy.py (2)
294-295:⚠️ Potential issue | 🟠 Major
baseline_off_occupancyis currently synthetic, not measured.
baseline_off_occupancyis copied from baseline occupancy, so allbaseline_off_occupancy_*metrics are duplicates and misleading. In current output formatting, there is no separate baseline-off occupancy value emitted to parse.Suggested minimal fix
- baseline_off_occupancy = baseline_occupancy.copy() + # train.py/evaluation summary does not currently emit baseline_off occupancy. + # Keep this explicitly unavailable instead of duplicating baseline occupancy. + baseline_off_occupancy = np.full_like(baseline_occupancy, np.nan, dtype=float)Also applies to: 303-305, 335-335
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 294 - 295, The variable baseline_off_occupancy is currently set as baseline_occupancy.copy(), producing duplicate/misleading baseline_off_occupancy_* metrics; replace this synthetic copy with a real computed off-occupancy (e.g., baseline_off_occupancy = baseline_occupancy - baseline_on_occupancy or the equivalent measured subtraction you already compute elsewhere) or stop emitting baseline_off_occupancy_* fields. Update every place that emits or uses baseline_off_occupancy (and related baseline_off_occupancy_* metrics) to use the computed value (or remove those emissions) so the reported baseline-off occupancy is accurate and not a duplicate of baseline_occupancy.
329-330:⚠️ Potential issue | 🟠 Major
annual_total_savings*label is inaccurate when--eval-months != 12.These values are raw totals over the evaluated window, not annualized totals. This produces mislabeled outputs in CSV/JSON for 1/3/6-month sweeps.
Also applies to: 540-542
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 329 - 330, The fields annual_total_savings and annual_total_savings_off (and the similar ones around the other occurrence) are currently raw sums over the evaluation window; update the code that sets annual_total_savings and annual_total_savings_off to annualize those totals when the evaluation window isn't 12 months by multiplying the raw total (np.sum(savings) / np.sum(savings_off)) by (12.0 / eval_months) (or args.eval_months if that is the variable name), or alternatively rename the keys to total_savings/total_savings_off to reflect they are not annualized—apply the same fix to the other occurrence (the variables around lines 540-542) and ensure the CSV/JSON output uses the corrected values/names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 892-895: The validation currently unconditionally checks
seed-range flags and calls parser.error for invalid ranges; change it to only
perform the seed_step and min/max checks when explicit seeds were not provided
(i.e., when args.seeds is empty/None). Update the block that calls parser.error
for seed_step, max_seed and min_seed (the lines referencing args.seed_step,
args.max_seed, args.min_seed and parser.error) to be guarded by a condition like
"if not args.seeds:" so that explicit --seeds bypasses the range validation.
- Around line 468-470: The block that sets dropped_jobs_baseline_total currently
defaults it to 0.0 when None, which can create false deltas; change the fallback
to NaN (e.g., math.nan or float('nan')) instead of 0.0, ensure the math module
is imported if using math.nan, and keep the warning print (seed variable) but
update if desired; this preserves unknown baseline semantics for downstream
stats that consume dropped_jobs_baseline_total.
- Around line 452-456: The current fallback sets baseline_avg_wait_hours = 0.0
when baseline_wait_summary is missing, which fabricates data; change the
fallback to leave baseline as missing (e.g., set baseline_avg_wait_hours =
float("nan") or None) and similarly ensure agent_avg_wait_hours uses np.nan if
avg_wait is empty; update downstream computations that compute wait_delta_hours
and effective savings (where they assume numeric) to handle NaN/None (propagate
NaN or skip aggregation) so missing baseline data does not distort results; look
for the symbols agent_wait_summary, baseline_wait_summary, agent_avg_wait_hours,
baseline_avg_wait_hours, avg_wait, wait_delta_hours and effective-savings to
make the adjustments.
---
Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 294-295: The variable baseline_off_occupancy is currently set as
baseline_occupancy.copy(), producing duplicate/misleading
baseline_off_occupancy_* metrics; replace this synthetic copy with a real
computed off-occupancy (e.g., baseline_off_occupancy = baseline_occupancy -
baseline_on_occupancy or the equivalent measured subtraction you already compute
elsewhere) or stop emitting baseline_off_occupancy_* fields. Update every place
that emits or uses baseline_off_occupancy (and related baseline_off_occupancy_*
metrics) to use the computed value (or remove those emissions) so the reported
baseline-off occupancy is accurate and not a duplicate of baseline_occupancy.
- Around line 329-330: The fields annual_total_savings and
annual_total_savings_off (and the similar ones around the other occurrence) are
currently raw sums over the evaluation window; update the code that sets
annual_total_savings and annual_total_savings_off to annualize those totals when
the evaluation window isn't 12 months by multiplying the raw total
(np.sum(savings) / np.sum(savings_off)) by (12.0 / eval_months) (or
args.eval_months if that is the variable name), or alternatively rename the keys
to total_savings/total_savings_off to reflect they are not annualized—apply the
same fix to the other occurrence (the variables around lines 540-542) and ensure
the CSV/JSON output uses the corrected values/names.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5a207639-522c-4b83-a77a-7d459c411d01
📒 Files selected for processing (1)
analyze_seed_occupancy.py
… mode for a seed-based sweep with fixed --job-arrival-scale 1.0.
c10fda2 to
c27dab8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
analyze_seed_occupancy.py (2)
298-310:⚠️ Potential issue | 🟠 Major
baseline_off_occupancyis still duplicated from baseline occupancy.Line 298 copies baseline occupancy, so
baseline_off_occupancy_*metrics are not independent and can mislead plots/CSV interpretation.Also applies to: 343-343
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 298 - 310, The code incorrectly sets baseline_off_occupancy = baseline_occupancy.copy(), duplicating the baseline metrics; replace that assignment so baseline_off_occupancy is derived from the correct "off" array (e.g., occupancy_off or baseline_occupancy_off — whichever variable in the diff holds the off-run occupancy) and use .copy() if needed, then recompute the SeedRunStats fields (baseline_off_occupancy_mean/std) from that corrected array; apply the same fix for the other duplicate at the later occurrence (around the second instance noted).
477-478:⚠️ Potential issue | 🟡 MinorAlign warning text with actual fallback behavior.
Line 477 sets NaN, but Line 478 says “defaulting to 0.” This is misleading during debugging.
Proposed fix
- print(f"[warn] seed={seed}: could not parse run-level baseline dropped total; defaulting to 0.") + print(f"[warn] seed={seed}: could not parse run-level baseline dropped total; setting NaN.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 477 - 478, The warning message is inconsistent with the fallback value: dropped_jobs_baseline_total is set to float("nan") but the print says "defaulting to 0."; update the warning text to reflect the actual fallback (e.g., "defaulting to NaN" or "set to NaN") so the log matches the variable dropped_jobs_baseline_total and aids debugging — adjust the print call that follows the assignment to reference the correct fallback value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 22-24: Update the FAST DEBUG command example at the top of the
file to call the correct script name: replace the reference to
analyze_hourlyjobs_seed_occupancy.py with analyze_seed_occupancy.py in the
command string (the multiline shell invocation shown in the file header) so the
example matches this file's actual script name and works when copy-pasted.
---
Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 298-310: The code incorrectly sets baseline_off_occupancy =
baseline_occupancy.copy(), duplicating the baseline metrics; replace that
assignment so baseline_off_occupancy is derived from the correct "off" array
(e.g., occupancy_off or baseline_occupancy_off — whichever variable in the diff
holds the off-run occupancy) and use .copy() if needed, then recompute the
SeedRunStats fields (baseline_off_occupancy_mean/std) from that corrected array;
apply the same fix for the other duplicate at the later occurrence (around the
second instance noted).
- Around line 477-478: The warning message is inconsistent with the fallback
value: dropped_jobs_baseline_total is set to float("nan") but the print says
"defaulting to 0."; update the warning text to reflect the actual fallback
(e.g., "defaulting to NaN" or "set to NaN") so the log matches the variable
dropped_jobs_baseline_total and aids debugging — adjust the print call that
follows the assignment to reference the correct fallback value.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 752f1296-afda-47e6-b649-68e818af87e8
📒 Files selected for processing (4)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pyanalyze_seed_occupancy.pysrc/analysis_reporting.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
analyze_seed_occupancy.py (2)
680-695: Consider addingstrict=Truetozip()for defensive coding.While the arrays are derived from the same source and should always have matching lengths, adding
strict=Truewould catch any future bugs where array lengths diverge.Proposed fix
- for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)): + for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 680 - 695, The loop over coordinates uses zip(x, y, point_colors) and should be made defensive by adding strict=True so mismatched lengths raise immediately; update the iteration to zip(x, y, point_colors, strict=True) in the block that calls ax.errorbar (referencing variables x, y, point_colors and helper _error_at) to catch any future divergence in array lengths.
169-216: Unusedavg_waitarray returned but never consumed.The
avg_waitarray (per-episode wait times) is extracted and returned but the caller at line 452 discards it, using the run-level wait summary instead. Consider either removing it from the return tuple or utilizing it in downstream analysis for episode-level wait statistics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 169 - 216, The avg_wait per-episode array is collected but unused downstream; remove it to avoid dead data: delete the avg_wait=list initialization, remove the line that appends _to_float(match.group("avg_wait")), and remove the corresponding np.asarray(agent_wait/avg_wait) entry from the returned tuple; then update any callers/unpacking that expected that extra return value (the caller currently discards it so adjust unpacking to match the new return shape). Use the avg_wait symbol and the EPISODE_RE extraction line to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 476-478: The printed warning in analyze_seed_occupancy.py is
inconsistent: dropped_jobs_baseline_total is set to float("nan") but the message
says "defaulting to 0"; update the print statement near the
dropped_jobs_baseline_total assignment to accurately state that it's defaulting
to NaN (or otherwise reflect the actual behavior), e.g., change the message to
mention "defaulting to NaN" or the chosen fallback, so the log matches the
variable value.
- Around line 460-463: The agent branch sets agent_avg_wait_hours = 0.0 while
baseline_avg_wait_hours is set to NaN, causing inconsistent and misleading
wait_delta_hours; update the handling in the block that checks
agent_wait_summary and baseline_wait_summary so that agent_avg_wait_hours
mirrors baseline behavior (use NaN) unless episode-level data exists — if
avg_wait episode data is present compute agent_avg_wait_hours =
float(np.mean(avg_wait)) (or equivalent) otherwise set agent_avg_wait_hours =
float("nan") to match baseline_avg_wait_hours; adjust references to
agent_wait_summary, baseline_wait_summary, agent_avg_wait_hours,
baseline_avg_wait_hours, avg_wait and wait_delta_hours accordingly.
---
Nitpick comments:
In `@analyze_seed_occupancy.py`:
- Around line 680-695: The loop over coordinates uses zip(x, y, point_colors)
and should be made defensive by adding strict=True so mismatched lengths raise
immediately; update the iteration to zip(x, y, point_colors, strict=True) in the
block that calls ax.errorbar (referencing variables x, y, point_colors and
helper _error_at) to catch any future divergence in array lengths.
- Around line 169-216: The avg_wait per-episode array is collected but unused
downstream; remove it to avoid dead data: delete the avg_wait=list
initialization, remove the line that appends _to_float(match.group("avg_wait")),
and remove the corresponding np.asarray(agent_wait/avg_wait) entry from the
returned tuple; then update any callers/unpacking that expected that extra
return value (the caller currently discards it so adjust unpacking to match the
new return shape). Use the avg_wait symbol and the EPISODE_RE extraction line to
locate the code to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b1d6d602-b59c-408c-8a12-bc145e109766
📒 Files selected for processing (1)
analyze_seed_occupancy.py
Use episode-level avg_wait mean as agent fallback when run-level wait summary cannot be parsed, instead of fabricating 0.0. Fix warning message for missing baseline dropped total to say "NaN" not "0". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
analyze_seed_occupancy.py (1)
298-310:⚠️ Potential issue | 🟠 Major
baseline_off_occupancyis currently fabricated from baseline occupancy.
baseline_off_occupancy = baseline_occupancy.copy()creates duplicate metrics/plots that look independent but are not. Since episode output currently exposes only one baseline node-occupancy value, this should be marked missing (NaN) or removed until a true baseline_off occupancy signal exists.Proposed minimal fix (avoid misleading duplicate metric)
- baseline_off_occupancy = baseline_occupancy.copy() + # No distinct baseline_off occupancy is emitted in episode summaries yet. + baseline_off_occupancy = np.full_like(baseline_occupancy, np.nan, dtype=float) + baseline_off_occupancy_mean, baseline_off_occupancy_std = finite_mean_std(baseline_off_occupancy) @@ - baseline_off_occupancy_mean=float(np.mean(baseline_off_occupancy)), - baseline_off_occupancy_std=float(np.std(baseline_off_occupancy)), + baseline_off_occupancy_mean=baseline_off_occupancy_mean, + baseline_off_occupancy_std=baseline_off_occupancy_std,Also applies to: 343-343, 525-527
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_seed_occupancy.py` around lines 298 - 310, The code currently fabricates baseline_off_occupancy via baseline_off_occupancy = baseline_occupancy.copy(), which produces misleading duplicate metrics; update the logic in the function that constructs and returns SeedRunStats so that baseline_off_occupancy is not copied from baseline_occupancy but instead set to a missing indicator (e.g., NaN) or omitted from SeedRunStats until a real baseline_off signal exists—modify the creation of baseline_off_occupancy and the corresponding fields baseline_off_occupancy_mean and baseline_off_occupancy_std in the returned SeedRunStats to reflect the missing value (NaN) or remove those fields consistently (also apply the same change at the other occurrences mentioned around the function: the lines near the other occurrences referenced in the comment).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_seed_occupancy.py`:
- Around line 17-19: The module docstring in analyze_seed_occupancy.py
incorrectly states a fixed 12-month evaluation; update it to reflect the
configurable duration controlled by the --eval-months flag (or eval_months
variable) so it no longer claims evaluation always runs for 12 months — describe
that the script runs train.py in evaluation mode for a configurable number of
months (default 12) resulting in 2*months episodes, parses per-episode metrics,
computes mean/std, and optionally fits polynomial trend lines.
---
Duplicate comments:
In `@analyze_seed_occupancy.py`:
- Around line 298-310: The code currently fabricates baseline_off_occupancy via
baseline_off_occupancy = baseline_occupancy.copy(), which produces misleading
duplicate metrics; update the logic in the function that constructs and returns
SeedRunStats so that baseline_off_occupancy is not copied from
baseline_occupancy but instead set to a missing indicator (e.g., NaN) or omitted
from SeedRunStats until a real baseline_off signal exists—modify the creation of
baseline_off_occupancy and the corresponding fields baseline_off_occupancy_mean
and baseline_off_occupancy_std in the returned SeedRunStats to reflect the
missing value (NaN) or remove those fields consistently (also apply the same
change at the other occurrences mentioned around the function: the lines near
the other occurrences referenced in the comment).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0733468-09e5-46f5-b770-8fee43b94ae2
📒 Files selected for processing (1)
analyze_seed_occupancy.py
| For each seed, this script runs train.py in evaluation mode for one year | ||
| (12 months = 24 episodes), parses per-episode metrics from stdout, computes | ||
| mean/std, and fits optional polynomial trend lines. |
There was a problem hiding this comment.
Docstring overstates evaluation duration as always 12 months.
The script supports --eval-months, so this text is inaccurate and can mislead users.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze_seed_occupancy.py` around lines 17 - 19, The module docstring in
analyze_seed_occupancy.py incorrectly states a fixed 12-month evaluation; update
it to reflect the configurable duration controlled by the --eval-months flag (or
eval_months variable) so it no longer claims evaluation always runs for 12
months — describe that the script runs train.py in evaluation mode for a
configurable number of months (default 12) resulting in 2*months episodes,
parses per-episode metrics, computes mean/std, and optionally fits polynomial
trend lines.
No description provided.