Skip to content

Pr3 analysis#33

Merged
rbx merged 7 commits intoFairRootGroup:masterfrom
enlorenz:pr3-analysis
Mar 12, 2026
Merged

Pr3 analysis#33
rbx merged 7 commits intoFairRootGroup:masterfrom
enlorenz:pr3-analysis

Conversation

@enlorenz
Copy link
Contributor

Dependent on Pr1 and Pr2

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

Adds a new arrival-scale sweep analysis tool, expands lambda analysis, instruments power/energy metrics across environment/baseline/metrics, validates and normalizes job-arrival-scale inputs, and refactors episode-summary generation and plotting/export of aggregated run statistics.

Changes

Cohort / File(s) Summary
Arrival-scale analysis
analyze_arrivalscale_occupancy.py
New large analysis script: runs train.py across scales and modes, parses per-episode/stdout metrics, computes ScaleRunStats, exports CSV/JSON, and renders multi-panel plots with optional fits and per-panel outputs.
Lambda analysis
analyze_lambda_occupancy.py
Extended parsing, LambdaRunStats fields, make_run_stats, plotting and CSV output to include baseline occupancies, arrivals/hour, dropped-job totals/deltas, cost/power deltas, and per-sample histories.
Power & reward infra
src/reward_calculation.py, src/baseline.py, src/environment.py, src/metrics_tracker.py
Added power_consumption_mwh; baseline_step return now includes power metrics; environment accumulates step/episode power; MetricsTracker tracks power counters and price helpers and emits expanded episode payload.
Arrival-scale validation & plumbing
src/arrival_scale.py, src/sampler_hourly.py, src/workload_generator.py, train_iter.py
New validate_job_arrival_scale and integration in sampler, workload generator, and CLI validation paths.
Episode summary & train output
src/evaluation_summary.py, train.py
Added formatting helpers, mean_occupancy_pct, and build_episode_summary_line; train uses these helpers, adds robust summary sections (costs per 1k, dropped-jobs-per-saved-euro, power/price, arrivals/hour).
Analysis naming & I/O
src/analysis_naming.py, + plotting helpers in analysis scripts
New helpers to build slugified analysis dir names/timestamps; plotting pipeline supports individual per-panel output directories and CSV/JSON summary exporters.
Misc small updates
train_iter.py, train.py
Added small helpers (safe_ratio, fmt_optional), replaced inline prints with structured summary builder, and wired arrival-scale validation into train invocation.

Sequence Diagram(s)

sequenceDiagram
  participant Analyzer as ArrivalScaleScript
  participant Train as train.py
  participant Env as ComputeClusterEnv
  participant Metrics as MetricsTracker
  participant FS as Filesystem/Plots

  Analyzer->>Train: spawn train.py (mode, job_arrival_scale)
  Train->>Env: instantiate env(job_arrival_scale, replay flags)
  Env->>Metrics: emit episode events (costs, occupancy, power, dropped,...)
  Metrics-->>Train: formatted episode summary lines (stdout)
  Train-->>Analyzer: run stdout with episode + run summaries
  Analyzer->>Analyzer: parse stdout -> build ScaleRunStats / aggregates
  Analyzer->>FS: write CSV/JSON and save plots (per‑panel optional)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Pr3 analysis' is vague and does not clearly describe the main changes in the pull request. Use a more specific title that summarizes the primary change, such as 'Fix analyzer parsing for per-episode savings output' or 'Correct cumulative savings differencing in occupancy analyzers'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description states dependency on other PRs and relates to the changeset, though it is very brief and lacks detail about the specific changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan for PR comments
  • Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
analyze_arrivalscale_occupancy.py (2)

51-64: Consider extracting shared utilities with analyze_lambda_occupancy.py.

The regex patterns (EPISODE_RE, WAIT_SUMMARY_RE, etc.), helper functions (_to_float, _to_float_or_nan, safe_divide, safe_divide_arrays, finite_mean_std), and parsing functions are duplicated between this file and analyze_lambda_occupancy.py. Consider extracting these into a shared module like analyze_utils.py to reduce duplication.

Also applies to: 139-149, 248-261

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 51 - 64, The EPISODE_RE and
other regexes plus helper functions (_to_float, _to_float_or_nan, safe_divide,
safe_divide_arrays, finite_mean_std) and parsing logic are duplicated with
analyze_lambda_occupancy.py; extract these shared utilities into a new module
(e.g., analyze_utils.py) containing the regex constants (EPISODE_RE,
WAIT_SUMMARY_RE, etc.) and the helper functions, update
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py to import
those symbols from analyze_utils instead of redefining them, remove the
duplicate definitions from both files, and run/adjust any unit or integration
tests that reference the moved symbols to ensure imports and names (e.g.,
EPISODE_RE, _to_float, safe_divide_arrays, finite_mean_std) are correct.

715-730: Consider adding strict=True to zip() calls.

Similar to analyze_lambda_occupancy.py, adding strict=True would catch any array length mismatches during development.

Also applies to: 733-764, 869-870

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 715 - 730, The zip calls in
the plotting loops (e.g., zip(x, y, point_colors) inside the loop that calls
ax.errorbar, and the other zip usages around the same area and at 869-870)
should include strict=True to surface any length mismatches during development;
update each zip(...) invocation in those sections to zip(..., strict=True)
(ensuring the runtime uses Python 3.8+) so mismatched array lengths raise an
immediate error rather than silently truncating.
analyze_lambda_occupancy.py (1)

770-802: Consider adding strict=True to zip() calls.

Static analysis flagged zip() calls without explicit strict= parameter. While the arrays should be the same length in practice, adding strict=True would catch any length mismatches during development.

🔧 Proposed fix
-        for i, (xv, c) in enumerate(zip(lambdas, point_colors)):
+        for i, (xv, c) in enumerate(zip(lambdas, point_colors, strict=True)):
...
-    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs):
+    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):

Also applies to: 906-908

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 770 - 802, The zip call in
draw_baseline_occupancy_pair that iterates over zip(lambdas, point_colors)
should be made strict to catch length mismatches at runtime — update the call to
use zip(lambdas, point_colors, strict=True); do the same for any other zip calls
that pair the same sequences (the other occurrence that zips lambdas with
point_colors and uses baseline_off/baseline arrays) so all zips that assume
equal-length inputs include strict=True to fail fast on size mismatches.
src/sampler_hourly.py (1)

316-316: Remove unnecessary int() cast.

math.floor() already returns an int in Python 3, so the int() cast is redundant.

🧹 Proposed fix
-            whole = int(math.floor(arrival_scale))
+            whole = math.floor(arrival_scale)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sampler_hourly.py` at line 316, The assignment to variable `whole` uses a
redundant `int()` cast around `math.floor(arrival_scale)`; update the expression
in the `whole` assignment (where `arrival_scale` is used) to remove the outer
`int()` and just use `math.floor(arrival_scale)` so the code reads `whole =
math.floor(arrival_scale)`.
train.py (1)

325-356: Remove extraneous f-prefixes from header strings.

Static analysis flagged these f-strings as having no placeholders. While functionally harmless, removing the f prefix keeps the code clean.

🧹 Proposed fix
-                print(f"\n=== JOB PROCESSING METRICS ===")
-                print(f"\nAgent:")
+                print("\n=== JOB PROCESSING METRICS ===")
+                print("\nAgent:")
...
-                print(f"\nBaseline:")
+                print("\nBaseline:")
...
-                print(f"\n=== COST PER 1,000 COMPLETED JOBS ===")
+                print("\n=== COST PER 1,000 COMPLETED JOBS ===")
...
-                print(f"\n=== AGENT DROPPED JOBS PER SAVED EURO ===")
+                print("\n=== AGENT DROPPED JOBS PER SAVED EURO ===")
...
-                print(f"\n=== POWER & PRICE METRICS (TOTAL OVER EVALUATION) ===")
+                print("\n=== POWER & PRICE METRICS (TOTAL OVER EVALUATION) ===")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@train.py` around lines 325 - 356, The header print statements use f-strings
with no placeholders; remove the unnecessary f prefix for those literal-only
prints (e.g., change print(f"\n=== JOB PROCESSING METRICS ===") and other
header/section prints like "\nAgent:", "\nBaseline:", "\n=== COST PER 1,000
COMPLETED JOBS ===", "\n=== AGENT DROPPED JOBS PER SAVED EURO ===", "\n=== POWER
& PRICE METRICS (TOTAL OVER EVALUATION) ===" to plain string prints) while
leaving prints that contain formatted placeholders (the ones with { ... })
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/workload_generator.py`:
- Around line 140-147: The generate_jobs path assumes job_arrival_scale is a
finite number and calls int(np.floor(job_arrival_scale)) which will raise on
nan/inf; add an explicit validation at the start of the block (or inside
generate_jobs) that checks job_arrival_scale is finite and > 0 when used (e.g.,
using math.isfinite or numpy.isfinite), and return the same failure/empty result
or raise a clear ValueError if it is not; update the conditional that references
arrival_scale_applied_in_source and job_arrival_scale so invalid values are
detected before computing whole and frac to fail fast and avoid
int(np.floor(...)) on non-finite inputs.
- Around line 65-88: When doing raw exact replay (jobs_exact_replay path) you
round duration_minutes up to duration_hours but keep nnodes/cores_per_node
unchanged, inflating core-hours for sub-hour jobs; fix by scaling resource
counts to preserve original core-hours: compute scale = (duration_minutes/60) /
duration_hours and apply it to nnodes and/or cores_per_node (e.g., reduce
cores_per_node = max(1, int(round(cores_per_node * scale))) and if needed adjust
nnodes = min(MAX_NODES_PER_JOB, max(MIN_NODES_PER_JOB, int(round(nnodes *
scale))))), then clamp both to MIN_/MAX_ constants; update the block that
creates duration_hours, nnodes, cores_per_node (the for job in raw_jobs loop) so
behavior matches jobs_sampler.aggregate_jobs/convert_to_hourly_jobs semantics
for aggregated sampling.

---

Nitpick comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 51-64: The EPISODE_RE and other regexes plus helper functions
(_to_float, _to_float_or_nan, safe_divide, safe_divide_arrays, finite_mean_std)
and parsing logic are duplicated with analyze_lambda_occupancy.py; extract these
shared utilities into a new module (e.g., analyze_utils.py) containing the regex
constants (EPISODE_RE, WAIT_SUMMARY_RE, etc.) and the helper functions, update
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py to import
those symbols from analyze_utils instead of redefining them, remove the
duplicate definitions from both files, and run/adjust any unit or integration
tests that reference the moved symbols to ensure imports and names (e.g.,
EPISODE_RE, _to_float, safe_divide_arrays, finite_mean_std) are correct.
- Around line 715-730: The zip calls in the plotting loops (e.g., zip(x, y,
point_colors) inside the loop that calls ax.errorbar, and the other zip usages
around the same area and at 869-870) should include strict=True to surface any
length mismatches during development; update each zip(...) invocation in those
sections to zip(..., strict=True) (ensuring the runtime uses Python 3.8+) so
mismatched array lengths raise an immediate error rather than silently
truncating.

In `@analyze_lambda_occupancy.py`:
- Around line 770-802: The zip call in draw_baseline_occupancy_pair that
iterates over zip(lambdas, point_colors) should be made strict to catch length
mismatches at runtime — update the call to use zip(lambdas, point_colors,
strict=True); do the same for any other zip calls that pair the same sequences
(the other occurrence that zips lambdas with point_colors and uses
baseline_off/baseline arrays) so all zips that assume equal-length inputs
include strict=True to fail fast on size mismatches.

In `@src/sampler_hourly.py`:
- Line 316: The assignment to variable `whole` uses a redundant `int()` cast
around `math.floor(arrival_scale)`; update the expression in the `whole`
assignment (where `arrival_scale` is used) to remove the outer `int()` and just
use `math.floor(arrival_scale)` so the code reads `whole =
math.floor(arrival_scale)`.

In `@train.py`:
- Around line 325-356: The header print statements use f-strings with no
placeholders; remove the unnecessary f prefix for those literal-only prints
(e.g., change print(f"\n=== JOB PROCESSING METRICS ===") and other
header/section prints like "\nAgent:", "\nBaseline:", "\n=== COST PER 1,000
COMPLETED JOBS ===", "\n=== AGENT DROPPED JOBS PER SAVED EURO ===", "\n=== POWER
& PRICE METRICS (TOTAL OVER EVALUATION) ===" to plain string prints) while
leaving prints that contain formatted placeholders (the ones with { ... })
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: edd4047f-533e-4fe4-a40b-5f253891273e

📥 Commits

Reviewing files that changed from the base of the PR and between 7c585e9 and 1b11cff.

📒 Files selected for processing (11)
  • analyze_arrivalscale_occupancy.py
  • analyze_lambda_occupancy.py
  • src/baseline.py
  • src/environment.py
  • src/metrics_tracker.py
  • src/reward_calculation.py
  • src/sampler_hourly.py
  • src/sampler_jobs.py
  • src/workload_generator.py
  • train.py
  • train_iter.py

Enis Lorenz added 4 commits March 12, 2026 12:29
…de_costs

- add evaluation output for cost per 1,000 completed jobs (agent/baseline/baseline_off)
- add evaluation output for agent dropped jobs per saved euro (vs baseline and baseline_off)
- add safe ratio/format helpers in train.py for robust logging and n/a handling
- store new per-episode metrics in MetricsTracker.record_episode_completion():
  - agent_cost_per_1000_completed_jobs
  - baseline_cost_per_1000_completed_jobs
  - baseline_off_cost_per_1000_completed_jobs
  - agent_dropped_jobs_per_saved_euro
  - agent_dropped_jobs_per_saved_euro_off
- use NaN for undefined ratios (e.g., zero completed jobs or non-positive savings)
- extend lambda sweep parser to read per-episode CostPer1kCompleted and Power triplets
- compute occupancy-based percent deltas:
  - (baseline - agent) / baseline for cost per 1k completed jobs
  - (baseline_off - agent) / baseline_off for cost per 1k completed jobs
  - (baseline_off - agent) / baseline_off for power
- expand trend plot from 2x3 to 3x3 and add the three new panels
- include new delta mean/std fields in run stats and summary.csv
…summary

- add `analyze_arrivalscale_occupancy.py` to sweep `--job-arrival-scale` over two modes:
  - `exact_replay_aggregate` (`--jobs-exact-replay --jobs-exact-replay-aggregate`)
  - `sampling` (`--jobs --job-arrival-scale`, with optional `--seed`)
- keep scale ordering deterministic by always running `scale=1.0` first in both modes
- generate per-mode trend plots plus CSV/JSON outputs (including selected vs executed scales)
- in `train.py` evaluation summary, print job arrivals per hour as `mean ± std`
@enlorenz enlorenz force-pushed the pr3-analysis branch 2 times, most recently from 1389e5e to c1dc4dc Compare March 12, 2026 12:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
analyze_lambda_occupancy.py (1)

770-772: Consider adding strict=True to zip() calls for defensive programming.

Lines 771 and 907 use zip() without the strict= parameter. While the arrays should always have matching lengths by construction, adding strict=True would catch any future bugs where lengths diverge.

♻️ Proposed fix
-        for i, (xv, c) in enumerate(zip(lambdas, point_colors)):
+        for i, (xv, c) in enumerate(zip(lambdas, point_colors, strict=True)):
-    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs):
+    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):

Also applies to: 906-908

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 770 - 772, The zip() calls used to
iterate paired sequences should be made strict to fail fast on length
mismatches; update the zip in draw_baseline_occupancy_pair (iterating lambdas,
point_colors, baseline_occ_mean) to use zip(..., strict=True) and likewise
update the other zip call around the later loop (the one that zips lambdas and
point_colors at the other occurrence) so any future length divergence raises an
error immediately.
analyze_arrivalscale_occupancy.py (2)

1-49: Significant code duplication with analyze_lambda_occupancy.py should be extracted to a shared module.

This file shares approximately 70-80% of its code with analyze_lambda_occupancy.py:

  • Identical regex patterns (lines 51-83 vs 45-77)
  • Identical parsing utilities (_to_float, _to_float_or_nan, _diff_cumulative, parse_episode_metrics, parse_wait_summary, parse_arrivals_summary, parse_dropped_totals_summary)
  • Identical math utilities (safe_divide, safe_divide_arrays, finite_mean_std, polyfit_curve)
  • Nearly identical dataclass (ScaleRunStats vs LambdaRunStats)
  • Nearly identical make_run_stats, write_summary_csv, and make_plot functions

Consider extracting shared code into a common module (e.g., analysis_common.py) to reduce maintenance burden and ensure consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 1 - 49, This file duplicates
~70–80% of analyze_lambda_occupancy.py; extract the shared regexes, parsing
helpers (_to_float, _to_float_or_nan, _diff_cumulative, parse_episode_metrics,
parse_wait_summary, parse_arrivals_summary, parse_dropped_totals_summary), math
utilities (safe_divide, safe_divide_arrays, finite_mean_std, polyfit_curve), and
common dataclass/IO/plot helpers (ScaleRunStats/LambdaRunStats, make_run_stats,
write_summary_csv, make_plot) into a new shared module (e.g.,
analysis_common.py), then import and reuse those symbols from
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py; update
references to the moved symbols (keep public names identical to avoid broad
refactors), remove the duplicated implementations, and run the scripts to
validate behavior remains unchanged.

715-715: Consider adding strict=True to zip() calls.

Same recommendation as for analyze_lambda_occupancy.py - adding strict=True provides defensive protection against length mismatches.

♻️ Proposed fix
-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
-        for i, (xv, c) in enumerate(zip(scales, point_colors)):
+        for i, (xv, c) in enumerate(zip(scales, point_colors, strict=True)):
-    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs):
+    for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):

Also applies to: 733-733, 869-869

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` at line 715, The zip(...) calls used with
enumerate—specifically the loop that unpacks into (xi, yi, c) via
enumerate(zip(x, y, point_colors)) and the other two zip calls flagged—should
pass strict=True to defensively catch length mismatches; update those zip(...)
invocations to enumerate(zip(x, y, point_colors, strict=True)) (and the
analogous zip(...) calls at the other locations) so a ValueError is raised if
any input sequence lengths differ, leaving the loop unpacking and variable names
(i, xi, yi, c) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_lambda_occupancy.py`:
- Around line 297-298: The current assignment baseline_off_occupancy =
baseline_occupancy.copy() wrongly duplicates occupancy metrics; update the
pipeline so evaluation_summary.py computes distinct baseline_off occupancy (mean
and std) reflecting the greedy "all nodes on, FIFO" behavior instead of copying
baseline_occupancy, or remove the redundant baseline_off fields; specifically
modify the code that produces baseline_off_occupancy,
baseline_off_occupancy_mean and baseline_off_occupancy_std (and any downstream
usage such as dropped_jobs_delta_total comparisons) to either compute occupancy
traces for the baseline_off scheduler (simulate greedy all-on FIFO occupancy) or
eliminate those fields from the summary output and tests. Ensure references to
baseline_off_* symbols are updated accordingly so outputs no longer mirror
baseline_occupancy.

In `@src/arrival_scale.py`:
- Around line 8-10: The ValueError message is CLI-flag-specific; update the
exception raised when validating the parsed job_arrival_scale (the variable
scale) to a generic message (e.g., "job arrival scale must be finite and >=
0.0") so callers from non-CLI code paths (like workload_generator and
sampler_hourly) get a neutral error; change the message in the raise
ValueError(...) where scale is checked (the validation block that calls
math.isfinite(scale) or scale < 0.0).

In `@train.py`:
- Around line 254-257: The occupancy percentages currently pull only the last
timestep via episode_*[-1]; change them to compute the episode-wide average (or
total-mean) instead: for agent_occupancy_cores_pct and
baseline_occupancy_cores_pct use the mean of env.metrics.episode_used_cores (and
episode_baseline_used_cores) divided by (CORES_PER_NODE * MAX_NODES) * 100, and
for agent_occupancy_nodes_pct and baseline_occupancy_nodes_pct use the mean of
env.metrics.episode_used_nodes (and episode_baseline_used_nodes) divided by
MAX_NODES * 100; update the calculations in train.py so they handle empty lists
the same way (return 0.0) and reference the existing variable names
(agent_occupancy_cores_pct, baseline_occupancy_cores_pct,
agent_occupancy_nodes_pct, baseline_occupancy_nodes_pct).
- Around line 326-356: Several print() calls that emit literal section headers
(e.g. print(f"\n=== JOB PROCESSING METRICS ==="), print(f"\nAgent:"),
print(f"\nBaseline:"), print(f"\n=== COST PER 1,000 COMPLETED JOBS ==="),
print(f"\n=== AGENT DROPPED JOBS PER SAVED EURO ==="), print(f"\n=== POWER &
PRICE METRICS (TOTAL OVER EVALUATION) ===")) use f-strings unnecessarily; remove
the leading f from those literal-only print calls so they become plain strings
and eliminate Ruff F541 warnings while leaving all interpolated prints (those
referencing variables like total_agent_cost, total_baseline_cost,
fmt_optional(...), total_jobs_completed, etc.) unchanged.

---

Nitpick comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 1-49: This file duplicates ~70–80% of analyze_lambda_occupancy.py;
extract the shared regexes, parsing helpers (_to_float, _to_float_or_nan,
_diff_cumulative, parse_episode_metrics, parse_wait_summary,
parse_arrivals_summary, parse_dropped_totals_summary), math utilities
(safe_divide, safe_divide_arrays, finite_mean_std, polyfit_curve), and common
dataclass/IO/plot helpers (ScaleRunStats/LambdaRunStats, make_run_stats,
write_summary_csv, make_plot) into a new shared module (e.g.,
analysis_common.py), then import and reuse those symbols from
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py; update
references to the moved symbols (keep public names identical to avoid broad
refactors), remove the duplicated implementations, and run the scripts to
validate behavior remains unchanged.
- Line 715: The zip(...) calls used with enumerate—specifically the loop that
unpacks into (xi, yi, c) via enumerate(zip(x, y, point_colors)) and the other
two zip calls flagged—should pass strict=True to defensively catch length
mismatches; update those zip(...) invocations to enumerate(zip(x, y,
point_colors, strict=True)) (and the analogous zip(...) calls at the other
locations) so a ValueError is raised if any input sequence lengths differ,
leaving the loop unpacking and variable names (i, xi, yi, c) unchanged.

In `@analyze_lambda_occupancy.py`:
- Around line 770-772: The zip() calls used to iterate paired sequences should
be made strict to fail fast on length mismatches; update the zip in
draw_baseline_occupancy_pair (iterating lambdas, point_colors,
baseline_occ_mean) to use zip(..., strict=True) and likewise update the other
zip call around the later loop (the one that zips lambdas and point_colors at
the other occurrence) so any future length divergence raises an error
immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6440d23f-935f-4343-bcf6-29ceb11f6f34

📥 Commits

Reviewing files that changed from the base of the PR and between 1b11cff and c1dc4dc.

📒 Files selected for processing (12)
  • analyze_arrivalscale_occupancy.py
  • analyze_lambda_occupancy.py
  • src/arrival_scale.py
  • src/baseline.py
  • src/environment.py
  • src/evaluation_summary.py
  • src/metrics_tracker.py
  • src/reward_calculation.py
  • src/sampler_hourly.py
  • src/workload_generator.py
  • train.py
  • train_iter.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/reward_calculation.py
  • train_iter.py

Enis Lorenz added 2 commits March 12, 2026 13:58
- add baseline occupancy and agent dropped parsing to both occupancy sweep analyzers
- parse run-level arrivals/hour mean±std and dropped totals (agent + baseline)
- extend run stats and CSV outputs with baseline/baseline_off occupancy, arrivals stats, and dropped delta (agent - baseline)
- expand figures from 3x3 to 4x3 with new panels for baseline vs baseline_off occupancy, jobs/hour, and dropped delta
- export each panel as its own image in a per-run `plots_individual` subdirectory, each including the colorbar
- print `Total Dropped Jobs (Baseline)` in `train.py` eval summaries so dropped-delta metrics are available to analyzers

Analysis Correct: Use per-episode metrics for evaluation episode summaries

- source episode summary values from recorded episode metrics
- fix mixed cumulative/per-episode savings and cost-per-1k values
- relabel cumulative queue metric as TimelineMaxQueue
- add focused coverage for episode summary formatting
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
train.py (1)

28-37: Consider consolidating fmt_optional with _fmt_optional in evaluation_summary.

train.py defines fmt_optional which has nearly identical logic to _fmt_optional in src/evaluation_summary.py. Consider making _fmt_optional public and reusing it here to avoid code duplication.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@train.py` around lines 28 - 37, Replace the duplicated formatter by reusing
the one from evaluation_summary: stop defining fmt_optional in train.py and
import the shared formatter from src.evaluation_summary (make the module expose
it by renaming or exporting _fmt_optional as fmt_optional or adding a public
wrapper). Specifically, remove the local fmt_optional in train.py and import the
public function from evaluation_summary, ensure call sites that use fmt_optional
(and related helpers like safe_ratio if needed) continue to work, and update any
references to _fmt_optional to the new public name to avoid duplication.
analyze_lambda_occupancy.py (1)

754-769: Consider adding strict=True to zip() for safer iteration.

While the arrays are guaranteed to have equal lengths (built from the same ordered list), adding strict=True to zip() calls would make this invariant explicit and catch any future bugs if array construction changes.

-        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)):
+        for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 754 - 769, Update the zip call in
the loop that iterates over x, y, point_colors inside
analyze_lambda_occupancy.py (the for i, (xi, yi, c) in enumerate(zip(x, y,
point_colors)) loop) to use zip(..., strict=True) so mismatched lengths raise
immediately (i.e., for i, (xi, yi, c) in enumerate(zip(x, y, point_colors,
strict=True))). Ensure the runtime environment uses Python 3.10+ or guard
accordingly if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@analyze_lambda_occupancy.py`:
- Around line 754-769: Update the zip call in the loop that iterates over x, y,
point_colors inside analyze_lambda_occupancy.py (the for i, (xi, yi, c) in
enumerate(zip(x, y, point_colors)) loop) to use zip(..., strict=True) so
mismatched lengths raise immediately (i.e., for i, (xi, yi, c) in
enumerate(zip(x, y, point_colors, strict=True))). Ensure the runtime environment
uses Python 3.10+ or guard accordingly if needed.

In `@train.py`:
- Around line 28-37: Replace the duplicated formatter by reusing the one from
evaluation_summary: stop defining fmt_optional in train.py and import the shared
formatter from src.evaluation_summary (make the module expose it by renaming or
exporting _fmt_optional as fmt_optional or adding a public wrapper).
Specifically, remove the local fmt_optional in train.py and import the public
function from evaluation_summary, ensure call sites that use fmt_optional (and
related helpers like safe_ratio if needed) continue to work, and update any
references to _fmt_optional to the new public name to avoid duplication.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0f088d83-28df-4aba-b198-a5153495e00c

📥 Commits

Reviewing files that changed from the base of the PR and between c1dc4dc and d3d273d.

📒 Files selected for processing (9)
  • analyze_arrivalscale_occupancy.py
  • analyze_lambda_occupancy.py
  • src/analysis_naming.py
  • src/arrival_scale.py
  • src/evaluation_summary.py
  • src/sampler_hourly.py
  • src/workload_generator.py
  • train.py
  • train_iter.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/workload_generator.py
  • train_iter.py
  • src/sampler_hourly.py

- stop differencing parsed episode savings in both occupancy analyzers
- treat Savings fields from train.py as already per-episode values
- remove obsolete cumulative-savings parsing logic
- add regression coverage for analyzer episode metric parsing
@rbx rbx merged commit c580593 into FairRootGroup:master Mar 12, 2026
3 of 4 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
analyze_lambda_occupancy.py (1)

289-290: ⚠️ Potential issue | 🟠 Major

baseline_off_occupancy is still synthetic.

Line 289 still copies baseline_occupancy, so the newly added baseline_off_* exports and plots cannot represent the greedy all-nodes-on baseline and will always mirror baseline. Please wire a real baseline-off occupancy series through the parser or remove these fields. Based on learnings: Baseline comparison must include greedy scheduler that keeps all nodes on and processes jobs FIFO.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 289 - 290, baseline_off_occupancy
is currently just a copy of baseline_occupancy, so the new baseline_off_*
exports and plots are synthetic and don't reflect the intended "all-nodes-on
greedy FIFO" baseline; either compute a true baseline_off_occupancy in the
parser/simulation or remove the baseline_off_* outputs. Fix by wiring a real
baseline-off series: add/modify the occupancy computation path (e.g., in the
parser/simulation function that produces baseline_occupancy) to simulate the
greedy scheduler that keeps all nodes on and services jobs FIFO and assign that
result to baseline_off_occupancy (and corresponding baseline_off_* exports), or
if you prefer to drop it, remove all uses/exports/plots of
baseline_off_occupancy and baseline_off_* so they no longer appear in
outputs/visualizations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 299-300: The variable baseline_off_occupancy is being fabricated
by copying baseline_occupancy (baseline_off_occupancy =
baseline_occupancy.copy()) which yields synthetic baseline-off series and causes
the new baseline_off_* CSV columns/plots to mirror baseline; either remove all
code that generates/exports baseline_off_* fields and plots, or properly plumb a
real baseline-off occupancy into the parser: update the episode-summary parsing
logic (the code that reads "Baseline Occupancy (Nodes)" from the
episode/evaluation summary) to extract a distinct baseline-off occupancy series
and assign it to baseline_off_occupancy (and then use that real series when
computing dropped_jobs_delta_total and writing CSV/plot outputs), ensuring you
stop creating baseline_off_occupancy by copying baseline_occupancy.
- Around line 400-425: The forwarded file paths args.prices and args.jobs are
passed raw into build_train_command and can become invalid once the child is
spawned from project_root; update build_train_command to normalize/resolve those
paths against the same base directory used for the child (e.g., resolve with
pathlib.Path(project_root) / args.prices or os.path.abspath relative to
project_root) before inserting them into the cmd list, and apply the same
normalization wherever the script spawns the child or performs the existence
check so that path validation and the command use the exact same resolved paths.
- Around line 493-500: The current fallback when parse_wait_summary fails masks
parser errors by copying agent metrics into baseline (agent_wait_summary ->
baseline_avg_wait_hours) and assigning zeros (e.g., dropped_jobs_delta_total),
which produces misleading "valid" deltas; in parse_wait_summary error paths
(check of agent_wait_summary and baseline_wait_summary) change the behavior to
surface failure: set agent_avg_wait_hours and baseline_avg_wait_hours to NaN
(use float("nan")) or raise an exception instead of copying values, and ensure
any dropped-job totals (e.g., dropped_jobs_delta_total) are set to NaN or
propagate the error rather than defaulting to 0.0; update any downstream logic
that assumes numeric values to handle NaN appropriately so parser failures are
visible during analysis.

In `@analyze_lambda_occupancy.py`:
- Around line 457-463: The code currently sets dropped_jobs_baseline_total to
0.0 when parse_dropped_totals_summary returns None, which hides parser failures;
change that behavior in the block handling dropped_jobs_baseline_total so that
when parse_dropped_totals_summary(...) returns None you assign numpy.nan (e.g.
dropped_jobs_baseline_total = np.nan) or raise an exception instead of 0.0, and
keep the existing warning print using lambda_value so downstream computations
like dropped_jobs_delta_total will surface NaNs rather than silently improving
the delta; reference parse_dropped_totals_summary, dropped_jobs_baseline_total,
dropped_jobs_agent_total, agent_dropped and lambda_value when making the change.

---

Duplicate comments:
In `@analyze_lambda_occupancy.py`:
- Around line 289-290: baseline_off_occupancy is currently just a copy of
baseline_occupancy, so the new baseline_off_* exports and plots are synthetic
and don't reflect the intended "all-nodes-on greedy FIFO" baseline; either
compute a true baseline_off_occupancy in the parser/simulation or remove the
baseline_off_* outputs. Fix by wiring a real baseline-off series: add/modify the
occupancy computation path (e.g., in the parser/simulation function that
produces baseline_occupancy) to simulate the greedy scheduler that keeps all
nodes on and services jobs FIFO and assign that result to baseline_off_occupancy
(and corresponding baseline_off_* exports), or if you prefer to drop it, remove
all uses/exports/plots of baseline_off_occupancy and baseline_off_* so they no
longer appear in outputs/visualizations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d1476f2-2cd2-4c3f-b0f0-4229541b5237

📥 Commits

Reviewing files that changed from the base of the PR and between d3d273d and 5af5255.

📒 Files selected for processing (2)
  • analyze_arrivalscale_occupancy.py
  • analyze_lambda_occupancy.py

Comment on lines +299 to +300
baseline_off_occupancy = baseline_occupancy.copy()
dropped_jobs_delta_total = dropped_jobs_agent_total - dropped_jobs_baseline_total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

baseline_off_occupancy is still synthetic.

src/evaluation_summary.py only emits Baseline Occupancy (Nodes), so Line 299 fabricates a second occupancy series instead of parsing a real baseline-off value. The new baseline_off_* CSV columns and plots will therefore always mirror baseline and misstate the analysis. Either plumb a real baseline-off occupancy through the episode summary/parser or drop these fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 299 - 300, The variable
baseline_off_occupancy is being fabricated by copying baseline_occupancy
(baseline_off_occupancy = baseline_occupancy.copy()) which yields synthetic
baseline-off series and causes the new baseline_off_* CSV columns/plots to
mirror baseline; either remove all code that generates/exports baseline_off_*
fields and plots, or properly plumb a real baseline-off occupancy into the
parser: update the episode-summary parsing logic (the code that reads "Baseline
Occupancy (Nodes)" from the episode/evaluation summary) to extract a distinct
baseline-off occupancy series and assign it to baseline_off_occupancy (and then
use that real series when computing dropped_jobs_delta_total and writing
CSV/plot outputs), ensuring you stop creating baseline_off_occupancy by copying
baseline_occupancy.

Comment on lines +400 to +425
def build_train_command(args: argparse.Namespace, job_arrival_scale: float, replay_mode: str) -> list[str]:
cmd = [
sys.executable,
"./train.py",
"--prices",
args.prices,
"--session",
args.session,
"--efficiency-weight",
str(args.efficiency_weight),
"--price-weight",
str(args.price_weight),
"--idle-weight",
str(args.idle_weight),
"--job-age-weight",
str(args.job_age_weight),
"--drop-weight",
str(args.drop_weight),
"--evaluate-savings",
"--eval-months",
str(args.eval_months),
"--model",
str(args.model),
"--jobs",
args.jobs,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize forwarded file paths before spawning train.py.

Lines 401-425 pass raw args.prices and args.jobs through to train.py, but Line 463 runs the child from project_root. That means relative paths are interpreted differently in the analyzer vs. in the child process; args.jobs can even pass the Line 942-944 existence check and still point somewhere else once cwd changes. Resolve the forwarded paths against the same base directory before building the command.

Also applies to: 461-463, 942-944

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 400 - 425, The forwarded file
paths args.prices and args.jobs are passed raw into build_train_command and can
become invalid once the child is spawned from project_root; update
build_train_command to normalize/resolve those paths against the same base
directory used for the child (e.g., resolve with pathlib.Path(project_root) /
args.prices or os.path.abspath relative to project_root) before inserting them
into the cmd list, and apply the same normalization wherever the script spawns
the child or performs the existence check so that path validation and the
command use the exact same resolved paths.

Comment on lines +493 to +500
agent_wait_summary, baseline_wait_summary = parse_wait_summary(combined_output)
if agent_wait_summary is None or baseline_wait_summary is None:
print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level wait summary; effective savings may be NaN.")
agent_avg_wait_hours = float(np.mean(avg_wait))
baseline_avg_wait_hours = float(np.mean(avg_wait))
else:
agent_avg_wait_hours = float(agent_wait_summary)
baseline_avg_wait_hours = float(baseline_wait_summary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These fallback defaults hide parser failures.

Line 497 copies the agent wait mean into the baseline slot, and Line 511 assigns 0.0 to a missing baseline dropped total. That makes effective_savings* and dropped_jobs_delta_total look valid even when the summary parse failed. Prefer NaN or a hard failure for missing baseline aggregates.

Suggested adjustment
     if agent_wait_summary is None or baseline_wait_summary is None:
         print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level wait summary; effective savings may be NaN.")
         agent_avg_wait_hours = float(np.mean(avg_wait))
-        baseline_avg_wait_hours = float(np.mean(avg_wait))
+        baseline_avg_wait_hours = float("nan")
     else:
         agent_avg_wait_hours = float(agent_wait_summary)
         baseline_avg_wait_hours = float(baseline_wait_summary)
@@
     if dropped_jobs_baseline_total is None:
-        dropped_jobs_baseline_total = 0.0
-        print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level baseline dropped total; defaulting to 0.")
+        dropped_jobs_baseline_total = float("nan")
+        print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level baseline dropped total; value set to NaN.")

Also applies to: 506-512

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_arrivalscale_occupancy.py` around lines 493 - 500, The current
fallback when parse_wait_summary fails masks parser errors by copying agent
metrics into baseline (agent_wait_summary -> baseline_avg_wait_hours) and
assigning zeros (e.g., dropped_jobs_delta_total), which produces misleading
"valid" deltas; in parse_wait_summary error paths (check of agent_wait_summary
and baseline_wait_summary) change the behavior to surface failure: set
agent_avg_wait_hours and baseline_avg_wait_hours to NaN (use float("nan")) or
raise an exception instead of copying values, and ensure any dropped-job totals
(e.g., dropped_jobs_delta_total) are set to NaN or propagate the error rather
than defaulting to 0.0; update any downstream logic that assumes numeric values
to handle NaN appropriately so parser failures are visible during analysis.

Comment on lines +457 to +463
dropped_jobs_agent_total, dropped_jobs_baseline_total = parse_dropped_totals_summary(combined_output)
if dropped_jobs_agent_total is None:
dropped_jobs_agent_total = float(np.sum(agent_dropped))
print(f"[warn] lambda={lambda_value}: could not parse run-level agent dropped total; using sum of episode Dropped= values.")
if dropped_jobs_baseline_total is None:
dropped_jobs_baseline_total = 0.0
print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; defaulting to 0.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn an unparsed baseline dropped total into zero.

If Total Dropped Jobs (Baseline) stops matching the summary format, Line 462 makes dropped_jobs_delta_total look better than it is. Use NaN or fail fast so parser drift stays visible.

Suggested adjustment
     if dropped_jobs_baseline_total is None:
-        dropped_jobs_baseline_total = 0.0
-        print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; defaulting to 0.")
+        dropped_jobs_baseline_total = float("nan")
+        print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; value set to NaN.")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_lambda_occupancy.py` around lines 457 - 463, The code currently sets
dropped_jobs_baseline_total to 0.0 when parse_dropped_totals_summary returns
None, which hides parser failures; change that behavior in the block handling
dropped_jobs_baseline_total so that when parse_dropped_totals_summary(...)
returns None you assign numpy.nan (e.g. dropped_jobs_baseline_total = np.nan) or
raise an exception instead of 0.0, and keep the existing warning print using
lambda_value so downstream computations like dropped_jobs_delta_total will
surface NaNs rather than silently improving the delta; reference
parse_dropped_totals_summary, dropped_jobs_baseline_total,
dropped_jobs_agent_total, agent_dropped and lambda_value when making the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants