Conversation
📝 WalkthroughWalkthroughAdds a new arrival-scale sweep analysis tool, expands lambda analysis, instruments power/energy metrics across environment/baseline/metrics, validates and normalizes job-arrival-scale inputs, and refactors episode-summary generation and plotting/export of aggregated run statistics. Changes
Sequence Diagram(s)sequenceDiagram
participant Analyzer as ArrivalScaleScript
participant Train as train.py
participant Env as ComputeClusterEnv
participant Metrics as MetricsTracker
participant FS as Filesystem/Plots
Analyzer->>Train: spawn train.py (mode, job_arrival_scale)
Train->>Env: instantiate env(job_arrival_scale, replay flags)
Env->>Metrics: emit episode events (costs, occupancy, power, dropped,...)
Metrics-->>Train: formatted episode summary lines (stdout)
Train-->>Analyzer: run stdout with episode + run summaries
Analyzer->>Analyzer: parse stdout -> build ScaleRunStats / aggregates
Analyzer->>FS: write CSV/JSON and save plots (per‑panel optional)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
analyze_arrivalscale_occupancy.py (2)
51-64: Consider extracting shared utilities with analyze_lambda_occupancy.py.The regex patterns (
EPISODE_RE,WAIT_SUMMARY_RE, etc.), helper functions (_to_float,_to_float_or_nan,safe_divide,safe_divide_arrays,finite_mean_std), and parsing functions are duplicated between this file andanalyze_lambda_occupancy.py. Consider extracting these into a shared module likeanalyze_utils.pyto reduce duplication.Also applies to: 139-149, 248-261
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_arrivalscale_occupancy.py` around lines 51 - 64, The EPISODE_RE and other regexes plus helper functions (_to_float, _to_float_or_nan, safe_divide, safe_divide_arrays, finite_mean_std) and parsing logic are duplicated with analyze_lambda_occupancy.py; extract these shared utilities into a new module (e.g., analyze_utils.py) containing the regex constants (EPISODE_RE, WAIT_SUMMARY_RE, etc.) and the helper functions, update analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py to import those symbols from analyze_utils instead of redefining them, remove the duplicate definitions from both files, and run/adjust any unit or integration tests that reference the moved symbols to ensure imports and names (e.g., EPISODE_RE, _to_float, safe_divide_arrays, finite_mean_std) are correct.
715-730: Consider addingstrict=Truetozip()calls.Similar to analyze_lambda_occupancy.py, adding
strict=Truewould catch any array length mismatches during development.Also applies to: 733-764, 869-870
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_arrivalscale_occupancy.py` around lines 715 - 730, The zip calls in the plotting loops (e.g., zip(x, y, point_colors) inside the loop that calls ax.errorbar, and the other zip usages around the same area and at 869-870) should include strict=True to surface any length mismatches during development; update each zip(...) invocation in those sections to zip(..., strict=True) (ensuring the runtime uses Python 3.8+) so mismatched array lengths raise an immediate error rather than silently truncating.analyze_lambda_occupancy.py (1)
770-802: Consider addingstrict=Truetozip()calls.Static analysis flagged
zip()calls without explicitstrict=parameter. While the arrays should be the same length in practice, addingstrict=Truewould catch any length mismatches during development.🔧 Proposed fix
- for i, (xv, c) in enumerate(zip(lambdas, point_colors)): + for i, (xv, c) in enumerate(zip(lambdas, point_colors, strict=True)): ... - for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs): + for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):Also applies to: 906-908
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_lambda_occupancy.py` around lines 770 - 802, The zip call in draw_baseline_occupancy_pair that iterates over zip(lambdas, point_colors) should be made strict to catch length mismatches at runtime — update the call to use zip(lambdas, point_colors, strict=True); do the same for any other zip calls that pair the same sequences (the other occurrence that zips lambdas with point_colors and uses baseline_off/baseline arrays) so all zips that assume equal-length inputs include strict=True to fail fast on size mismatches.src/sampler_hourly.py (1)
316-316: Remove unnecessaryint()cast.
math.floor()already returns anintin Python 3, so theint()cast is redundant.🧹 Proposed fix
- whole = int(math.floor(arrival_scale)) + whole = math.floor(arrival_scale)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampler_hourly.py` at line 316, The assignment to variable `whole` uses a redundant `int()` cast around `math.floor(arrival_scale)`; update the expression in the `whole` assignment (where `arrival_scale` is used) to remove the outer `int()` and just use `math.floor(arrival_scale)` so the code reads `whole = math.floor(arrival_scale)`.train.py (1)
325-356: Remove extraneous f-prefixes from header strings.Static analysis flagged these f-strings as having no placeholders. While functionally harmless, removing the
fprefix keeps the code clean.🧹 Proposed fix
- print(f"\n=== JOB PROCESSING METRICS ===") - print(f"\nAgent:") + print("\n=== JOB PROCESSING METRICS ===") + print("\nAgent:") ... - print(f"\nBaseline:") + print("\nBaseline:") ... - print(f"\n=== COST PER 1,000 COMPLETED JOBS ===") + print("\n=== COST PER 1,000 COMPLETED JOBS ===") ... - print(f"\n=== AGENT DROPPED JOBS PER SAVED EURO ===") + print("\n=== AGENT DROPPED JOBS PER SAVED EURO ===") ... - print(f"\n=== POWER & PRICE METRICS (TOTAL OVER EVALUATION) ===") + print("\n=== POWER & PRICE METRICS (TOTAL OVER EVALUATION) ===")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@train.py` around lines 325 - 356, The header print statements use f-strings with no placeholders; remove the unnecessary f prefix for those literal-only prints (e.g., change print(f"\n=== JOB PROCESSING METRICS ===") and other header/section prints like "\nAgent:", "\nBaseline:", "\n=== COST PER 1,000 COMPLETED JOBS ===", "\n=== AGENT DROPPED JOBS PER SAVED EURO ===", "\n=== POWER & PRICE METRICS (TOTAL OVER EVALUATION) ===" to plain string prints) while leaving prints that contain formatted placeholders (the ones with { ... }) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/workload_generator.py`:
- Around line 140-147: The generate_jobs path assumes job_arrival_scale is a
finite number and calls int(np.floor(job_arrival_scale)) which will raise on
nan/inf; add an explicit validation at the start of the block (or inside
generate_jobs) that checks job_arrival_scale is finite and > 0 when used (e.g.,
using math.isfinite or numpy.isfinite), and return the same failure/empty result
or raise a clear ValueError if it is not; update the conditional that references
arrival_scale_applied_in_source and job_arrival_scale so invalid values are
detected before computing whole and frac to fail fast and avoid
int(np.floor(...)) on non-finite inputs.
- Around line 65-88: When doing raw exact replay (jobs_exact_replay path) you
round duration_minutes up to duration_hours but keep nnodes/cores_per_node
unchanged, inflating core-hours for sub-hour jobs; fix by scaling resource
counts to preserve original core-hours: compute scale = (duration_minutes/60) /
duration_hours and apply it to nnodes and/or cores_per_node (e.g., reduce
cores_per_node = max(1, int(round(cores_per_node * scale))) and if needed adjust
nnodes = min(MAX_NODES_PER_JOB, max(MIN_NODES_PER_JOB, int(round(nnodes *
scale))))), then clamp both to MIN_/MAX_ constants; update the block that
creates duration_hours, nnodes, cores_per_node (the for job in raw_jobs loop) so
behavior matches jobs_sampler.aggregate_jobs/convert_to_hourly_jobs semantics
for aggregated sampling.
---
Nitpick comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 51-64: The EPISODE_RE and other regexes plus helper functions
(_to_float, _to_float_or_nan, safe_divide, safe_divide_arrays, finite_mean_std)
and parsing logic are duplicated with analyze_lambda_occupancy.py; extract these
shared utilities into a new module (e.g., analyze_utils.py) containing the regex
constants (EPISODE_RE, WAIT_SUMMARY_RE, etc.) and the helper functions, update
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py to import
those symbols from analyze_utils instead of redefining them, remove the
duplicate definitions from both files, and run/adjust any unit or integration
tests that reference the moved symbols to ensure imports and names (e.g.,
EPISODE_RE, _to_float, safe_divide_arrays, finite_mean_std) are correct.
- Around line 715-730: The zip calls in the plotting loops (e.g., zip(x, y,
point_colors) inside the loop that calls ax.errorbar, and the other zip usages
around the same area and at 869-870) should include strict=True to surface any
length mismatches during development; update each zip(...) invocation in those
sections to zip(..., strict=True) (ensuring the runtime uses Python 3.8+) so
mismatched array lengths raise an immediate error rather than silently
truncating.
In `@analyze_lambda_occupancy.py`:
- Around line 770-802: The zip call in draw_baseline_occupancy_pair that
iterates over zip(lambdas, point_colors) should be made strict to catch length
mismatches at runtime — update the call to use zip(lambdas, point_colors,
strict=True); do the same for any other zip calls that pair the same sequences
(the other occurrence that zips lambdas with point_colors and uses
baseline_off/baseline arrays) so all zips that assume equal-length inputs
include strict=True to fail fast on size mismatches.
In `@src/sampler_hourly.py`:
- Line 316: The assignment to variable `whole` uses a redundant `int()` cast
around `math.floor(arrival_scale)`; update the expression in the `whole`
assignment (where `arrival_scale` is used) to remove the outer `int()` and just
use `math.floor(arrival_scale)` so the code reads `whole =
math.floor(arrival_scale)`.
In `@train.py`:
- Around line 325-356: The header print statements use f-strings with no
placeholders; remove the unnecessary f prefix for those literal-only prints
(e.g., change print(f"\n=== JOB PROCESSING METRICS ===") and other
header/section prints like "\nAgent:", "\nBaseline:", "\n=== COST PER 1,000
COMPLETED JOBS ===", "\n=== AGENT DROPPED JOBS PER SAVED EURO ===", "\n=== POWER
& PRICE METRICS (TOTAL OVER EVALUATION) ===" to plain string prints) while
leaving prints that contain formatted placeholders (the ones with { ... })
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: edd4047f-533e-4fe4-a40b-5f253891273e
📒 Files selected for processing (11)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pysrc/baseline.pysrc/environment.pysrc/metrics_tracker.pysrc/reward_calculation.pysrc/sampler_hourly.pysrc/sampler_jobs.pysrc/workload_generator.pytrain.pytrain_iter.py
…de_costs - add evaluation output for cost per 1,000 completed jobs (agent/baseline/baseline_off) - add evaluation output for agent dropped jobs per saved euro (vs baseline and baseline_off) - add safe ratio/format helpers in train.py for robust logging and n/a handling - store new per-episode metrics in MetricsTracker.record_episode_completion(): - agent_cost_per_1000_completed_jobs - baseline_cost_per_1000_completed_jobs - baseline_off_cost_per_1000_completed_jobs - agent_dropped_jobs_per_saved_euro - agent_dropped_jobs_per_saved_euro_off - use NaN for undefined ratios (e.g., zero completed jobs or non-positive savings)
- extend lambda sweep parser to read per-episode CostPer1kCompleted and Power triplets - compute occupancy-based percent deltas: - (baseline - agent) / baseline for cost per 1k completed jobs - (baseline_off - agent) / baseline_off for cost per 1k completed jobs - (baseline_off - agent) / baseline_off for power - expand trend plot from 2x3 to 3x3 and add the three new panels - include new delta mean/std fields in run stats and summary.csv
…summary - add `analyze_arrivalscale_occupancy.py` to sweep `--job-arrival-scale` over two modes: - `exact_replay_aggregate` (`--jobs-exact-replay --jobs-exact-replay-aggregate`) - `sampling` (`--jobs --job-arrival-scale`, with optional `--seed`) - keep scale ordering deterministic by always running `scale=1.0` first in both modes - generate per-mode trend plots plus CSV/JSON outputs (including selected vs executed scales) - in `train.py` evaluation summary, print job arrivals per hour as `mean ± std`
1389e5e to
c1dc4dc
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
analyze_lambda_occupancy.py (1)
770-772: Consider addingstrict=Truetozip()calls for defensive programming.Lines 771 and 907 use
zip()without thestrict=parameter. While the arrays should always have matching lengths by construction, addingstrict=Truewould catch any future bugs where lengths diverge.♻️ Proposed fix
- for i, (xv, c) in enumerate(zip(lambdas, point_colors)): + for i, (xv, c) in enumerate(zip(lambdas, point_colors, strict=True)):- for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs): + for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):Also applies to: 906-908
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_lambda_occupancy.py` around lines 770 - 772, The zip() calls used to iterate paired sequences should be made strict to fail fast on length mismatches; update the zip in draw_baseline_occupancy_pair (iterating lambdas, point_colors, baseline_occ_mean) to use zip(..., strict=True) and likewise update the other zip call around the later loop (the one that zips lambdas and point_colors at the other occurrence) so any future length divergence raises an error immediately.analyze_arrivalscale_occupancy.py (2)
1-49: Significant code duplication withanalyze_lambda_occupancy.pyshould be extracted to a shared module.This file shares approximately 70-80% of its code with
analyze_lambda_occupancy.py:
- Identical regex patterns (lines 51-83 vs 45-77)
- Identical parsing utilities (
_to_float,_to_float_or_nan,_diff_cumulative,parse_episode_metrics,parse_wait_summary,parse_arrivals_summary,parse_dropped_totals_summary)- Identical math utilities (
safe_divide,safe_divide_arrays,finite_mean_std,polyfit_curve)- Nearly identical dataclass (
ScaleRunStatsvsLambdaRunStats)- Nearly identical
make_run_stats,write_summary_csv, andmake_plotfunctionsConsider extracting shared code into a common module (e.g.,
analysis_common.py) to reduce maintenance burden and ensure consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_arrivalscale_occupancy.py` around lines 1 - 49, This file duplicates ~70–80% of analyze_lambda_occupancy.py; extract the shared regexes, parsing helpers (_to_float, _to_float_or_nan, _diff_cumulative, parse_episode_metrics, parse_wait_summary, parse_arrivals_summary, parse_dropped_totals_summary), math utilities (safe_divide, safe_divide_arrays, finite_mean_std, polyfit_curve), and common dataclass/IO/plot helpers (ScaleRunStats/LambdaRunStats, make_run_stats, write_summary_csv, make_plot) into a new shared module (e.g., analysis_common.py), then import and reuse those symbols from analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py; update references to the moved symbols (keep public names identical to avoid broad refactors), remove the duplicated implementations, and run the scripts to validate behavior remains unchanged.
715-715: Consider addingstrict=Truetozip()calls.Same recommendation as for
analyze_lambda_occupancy.py- addingstrict=Trueprovides defensive protection against length mismatches.♻️ Proposed fix
- for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)): + for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):- for i, (xv, c) in enumerate(zip(scales, point_colors)): + for i, (xv, c) in enumerate(zip(scales, point_colors, strict=True)):- for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs): + for ax, (_, draw_fn) in zip(axes.ravel(), panel_specs, strict=True):Also applies to: 733-733, 869-869
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_arrivalscale_occupancy.py` at line 715, The zip(...) calls used with enumerate—specifically the loop that unpacks into (xi, yi, c) via enumerate(zip(x, y, point_colors)) and the other two zip calls flagged—should pass strict=True to defensively catch length mismatches; update those zip(...) invocations to enumerate(zip(x, y, point_colors, strict=True)) (and the analogous zip(...) calls at the other locations) so a ValueError is raised if any input sequence lengths differ, leaving the loop unpacking and variable names (i, xi, yi, c) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_lambda_occupancy.py`:
- Around line 297-298: The current assignment baseline_off_occupancy =
baseline_occupancy.copy() wrongly duplicates occupancy metrics; update the
pipeline so evaluation_summary.py computes distinct baseline_off occupancy (mean
and std) reflecting the greedy "all nodes on, FIFO" behavior instead of copying
baseline_occupancy, or remove the redundant baseline_off fields; specifically
modify the code that produces baseline_off_occupancy,
baseline_off_occupancy_mean and baseline_off_occupancy_std (and any downstream
usage such as dropped_jobs_delta_total comparisons) to either compute occupancy
traces for the baseline_off scheduler (simulate greedy all-on FIFO occupancy) or
eliminate those fields from the summary output and tests. Ensure references to
baseline_off_* symbols are updated accordingly so outputs no longer mirror
baseline_occupancy.
In `@src/arrival_scale.py`:
- Around line 8-10: The ValueError message is CLI-flag-specific; update the
exception raised when validating the parsed job_arrival_scale (the variable
scale) to a generic message (e.g., "job arrival scale must be finite and >=
0.0") so callers from non-CLI code paths (like workload_generator and
sampler_hourly) get a neutral error; change the message in the raise
ValueError(...) where scale is checked (the validation block that calls
math.isfinite(scale) or scale < 0.0).
In `@train.py`:
- Around line 254-257: The occupancy percentages currently pull only the last
timestep via episode_*[-1]; change them to compute the episode-wide average (or
total-mean) instead: for agent_occupancy_cores_pct and
baseline_occupancy_cores_pct use the mean of env.metrics.episode_used_cores (and
episode_baseline_used_cores) divided by (CORES_PER_NODE * MAX_NODES) * 100, and
for agent_occupancy_nodes_pct and baseline_occupancy_nodes_pct use the mean of
env.metrics.episode_used_nodes (and episode_baseline_used_nodes) divided by
MAX_NODES * 100; update the calculations in train.py so they handle empty lists
the same way (return 0.0) and reference the existing variable names
(agent_occupancy_cores_pct, baseline_occupancy_cores_pct,
agent_occupancy_nodes_pct, baseline_occupancy_nodes_pct).
- Around line 326-356: Several print() calls that emit literal section headers
(e.g. print(f"\n=== JOB PROCESSING METRICS ==="), print(f"\nAgent:"),
print(f"\nBaseline:"), print(f"\n=== COST PER 1,000 COMPLETED JOBS ==="),
print(f"\n=== AGENT DROPPED JOBS PER SAVED EURO ==="), print(f"\n=== POWER &
PRICE METRICS (TOTAL OVER EVALUATION) ===")) use f-strings unnecessarily; remove
the leading f from those literal-only print calls so they become plain strings
and eliminate Ruff F541 warnings while leaving all interpolated prints (those
referencing variables like total_agent_cost, total_baseline_cost,
fmt_optional(...), total_jobs_completed, etc.) unchanged.
---
Nitpick comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 1-49: This file duplicates ~70–80% of analyze_lambda_occupancy.py;
extract the shared regexes, parsing helpers (_to_float, _to_float_or_nan,
_diff_cumulative, parse_episode_metrics, parse_wait_summary,
parse_arrivals_summary, parse_dropped_totals_summary), math utilities
(safe_divide, safe_divide_arrays, finite_mean_std, polyfit_curve), and common
dataclass/IO/plot helpers (ScaleRunStats/LambdaRunStats, make_run_stats,
write_summary_csv, make_plot) into a new shared module (e.g.,
analysis_common.py), then import and reuse those symbols from
analyze_arrivalscale_occupancy.py and analyze_lambda_occupancy.py; update
references to the moved symbols (keep public names identical to avoid broad
refactors), remove the duplicated implementations, and run the scripts to
validate behavior remains unchanged.
- Line 715: The zip(...) calls used with enumerate—specifically the loop that
unpacks into (xi, yi, c) via enumerate(zip(x, y, point_colors)) and the other
two zip calls flagged—should pass strict=True to defensively catch length
mismatches; update those zip(...) invocations to enumerate(zip(x, y,
point_colors, strict=True)) (and the analogous zip(...) calls at the other
locations) so a ValueError is raised if any input sequence lengths differ,
leaving the loop unpacking and variable names (i, xi, yi, c) unchanged.
In `@analyze_lambda_occupancy.py`:
- Around line 770-772: The zip() calls used to iterate paired sequences should
be made strict to fail fast on length mismatches; update the zip in
draw_baseline_occupancy_pair (iterating lambdas, point_colors,
baseline_occ_mean) to use zip(..., strict=True) and likewise update the other
zip call around the later loop (the one that zips lambdas and point_colors at
the other occurrence) so any future length divergence raises an error
immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6440d23f-935f-4343-bcf6-29ceb11f6f34
📒 Files selected for processing (12)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pysrc/arrival_scale.pysrc/baseline.pysrc/environment.pysrc/evaluation_summary.pysrc/metrics_tracker.pysrc/reward_calculation.pysrc/sampler_hourly.pysrc/workload_generator.pytrain.pytrain_iter.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/reward_calculation.py
- train_iter.py
- add baseline occupancy and agent dropped parsing to both occupancy sweep analyzers - parse run-level arrivals/hour mean±std and dropped totals (agent + baseline) - extend run stats and CSV outputs with baseline/baseline_off occupancy, arrivals stats, and dropped delta (agent - baseline) - expand figures from 3x3 to 4x3 with new panels for baseline vs baseline_off occupancy, jobs/hour, and dropped delta - export each panel as its own image in a per-run `plots_individual` subdirectory, each including the colorbar - print `Total Dropped Jobs (Baseline)` in `train.py` eval summaries so dropped-delta metrics are available to analyzers Analysis Correct: Use per-episode metrics for evaluation episode summaries - source episode summary values from recorded episode metrics - fix mixed cumulative/per-episode savings and cost-per-1k values - relabel cumulative queue metric as TimelineMaxQueue - add focused coverage for episode summary formatting
There was a problem hiding this comment.
🧹 Nitpick comments (2)
train.py (1)
28-37: Consider consolidatingfmt_optionalwith_fmt_optionalin evaluation_summary.
train.pydefinesfmt_optionalwhich has nearly identical logic to_fmt_optionalinsrc/evaluation_summary.py. Consider making_fmt_optionalpublic and reusing it here to avoid code duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@train.py` around lines 28 - 37, Replace the duplicated formatter by reusing the one from evaluation_summary: stop defining fmt_optional in train.py and import the shared formatter from src.evaluation_summary (make the module expose it by renaming or exporting _fmt_optional as fmt_optional or adding a public wrapper). Specifically, remove the local fmt_optional in train.py and import the public function from evaluation_summary, ensure call sites that use fmt_optional (and related helpers like safe_ratio if needed) continue to work, and update any references to _fmt_optional to the new public name to avoid duplication.analyze_lambda_occupancy.py (1)
754-769: Consider addingstrict=Truetozip()for safer iteration.While the arrays are guaranteed to have equal lengths (built from the same ordered list), adding
strict=Truetozip()calls would make this invariant explicit and catch any future bugs if array construction changes.- for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)): + for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_lambda_occupancy.py` around lines 754 - 769, Update the zip call in the loop that iterates over x, y, point_colors inside analyze_lambda_occupancy.py (the for i, (xi, yi, c) in enumerate(zip(x, y, point_colors)) loop) to use zip(..., strict=True) so mismatched lengths raise immediately (i.e., for i, (xi, yi, c) in enumerate(zip(x, y, point_colors, strict=True))). Ensure the runtime environment uses Python 3.10+ or guard accordingly if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@analyze_lambda_occupancy.py`:
- Around line 754-769: Update the zip call in the loop that iterates over x, y,
point_colors inside analyze_lambda_occupancy.py (the for i, (xi, yi, c) in
enumerate(zip(x, y, point_colors)) loop) to use zip(..., strict=True) so
mismatched lengths raise immediately (i.e., for i, (xi, yi, c) in
enumerate(zip(x, y, point_colors, strict=True))). Ensure the runtime environment
uses Python 3.10+ or guard accordingly if needed.
In `@train.py`:
- Around line 28-37: Replace the duplicated formatter by reusing the one from
evaluation_summary: stop defining fmt_optional in train.py and import the shared
formatter from src.evaluation_summary (make the module expose it by renaming or
exporting _fmt_optional as fmt_optional or adding a public wrapper).
Specifically, remove the local fmt_optional in train.py and import the public
function from evaluation_summary, ensure call sites that use fmt_optional (and
related helpers like safe_ratio if needed) continue to work, and update any
references to _fmt_optional to the new public name to avoid duplication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0f088d83-28df-4aba-b198-a5153495e00c
📒 Files selected for processing (9)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.pysrc/analysis_naming.pysrc/arrival_scale.pysrc/evaluation_summary.pysrc/sampler_hourly.pysrc/workload_generator.pytrain.pytrain_iter.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/workload_generator.py
- train_iter.py
- src/sampler_hourly.py
- stop differencing parsed episode savings in both occupancy analyzers - treat Savings fields from train.py as already per-episode values - remove obsolete cumulative-savings parsing logic - add regression coverage for analyzer episode metric parsing
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
analyze_lambda_occupancy.py (1)
289-290:⚠️ Potential issue | 🟠 Major
baseline_off_occupancyis still synthetic.Line 289 still copies
baseline_occupancy, so the newly addedbaseline_off_*exports and plots cannot represent the greedy all-nodes-on baseline and will always mirror baseline. Please wire a real baseline-off occupancy series through the parser or remove these fields. Based on learnings: Baseline comparison must include greedy scheduler that keeps all nodes on and processes jobs FIFO.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@analyze_lambda_occupancy.py` around lines 289 - 290, baseline_off_occupancy is currently just a copy of baseline_occupancy, so the new baseline_off_* exports and plots are synthetic and don't reflect the intended "all-nodes-on greedy FIFO" baseline; either compute a true baseline_off_occupancy in the parser/simulation or remove the baseline_off_* outputs. Fix by wiring a real baseline-off series: add/modify the occupancy computation path (e.g., in the parser/simulation function that produces baseline_occupancy) to simulate the greedy scheduler that keeps all nodes on and services jobs FIFO and assign that result to baseline_off_occupancy (and corresponding baseline_off_* exports), or if you prefer to drop it, remove all uses/exports/plots of baseline_off_occupancy and baseline_off_* so they no longer appear in outputs/visualizations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@analyze_arrivalscale_occupancy.py`:
- Around line 299-300: The variable baseline_off_occupancy is being fabricated
by copying baseline_occupancy (baseline_off_occupancy =
baseline_occupancy.copy()) which yields synthetic baseline-off series and causes
the new baseline_off_* CSV columns/plots to mirror baseline; either remove all
code that generates/exports baseline_off_* fields and plots, or properly plumb a
real baseline-off occupancy into the parser: update the episode-summary parsing
logic (the code that reads "Baseline Occupancy (Nodes)" from the
episode/evaluation summary) to extract a distinct baseline-off occupancy series
and assign it to baseline_off_occupancy (and then use that real series when
computing dropped_jobs_delta_total and writing CSV/plot outputs), ensuring you
stop creating baseline_off_occupancy by copying baseline_occupancy.
- Around line 400-425: The forwarded file paths args.prices and args.jobs are
passed raw into build_train_command and can become invalid once the child is
spawned from project_root; update build_train_command to normalize/resolve those
paths against the same base directory used for the child (e.g., resolve with
pathlib.Path(project_root) / args.prices or os.path.abspath relative to
project_root) before inserting them into the cmd list, and apply the same
normalization wherever the script spawns the child or performs the existence
check so that path validation and the command use the exact same resolved paths.
- Around line 493-500: The current fallback when parse_wait_summary fails masks
parser errors by copying agent metrics into baseline (agent_wait_summary ->
baseline_avg_wait_hours) and assigning zeros (e.g., dropped_jobs_delta_total),
which produces misleading "valid" deltas; in parse_wait_summary error paths
(check of agent_wait_summary and baseline_wait_summary) change the behavior to
surface failure: set agent_avg_wait_hours and baseline_avg_wait_hours to NaN
(use float("nan")) or raise an exception instead of copying values, and ensure
any dropped-job totals (e.g., dropped_jobs_delta_total) are set to NaN or
propagate the error rather than defaulting to 0.0; update any downstream logic
that assumes numeric values to handle NaN appropriately so parser failures are
visible during analysis.
In `@analyze_lambda_occupancy.py`:
- Around line 457-463: The code currently sets dropped_jobs_baseline_total to
0.0 when parse_dropped_totals_summary returns None, which hides parser failures;
change that behavior in the block handling dropped_jobs_baseline_total so that
when parse_dropped_totals_summary(...) returns None you assign numpy.nan (e.g.
dropped_jobs_baseline_total = np.nan) or raise an exception instead of 0.0, and
keep the existing warning print using lambda_value so downstream computations
like dropped_jobs_delta_total will surface NaNs rather than silently improving
the delta; reference parse_dropped_totals_summary, dropped_jobs_baseline_total,
dropped_jobs_agent_total, agent_dropped and lambda_value when making the change.
---
Duplicate comments:
In `@analyze_lambda_occupancy.py`:
- Around line 289-290: baseline_off_occupancy is currently just a copy of
baseline_occupancy, so the new baseline_off_* exports and plots are synthetic
and don't reflect the intended "all-nodes-on greedy FIFO" baseline; either
compute a true baseline_off_occupancy in the parser/simulation or remove the
baseline_off_* outputs. Fix by wiring a real baseline-off series: add/modify the
occupancy computation path (e.g., in the parser/simulation function that
produces baseline_occupancy) to simulate the greedy scheduler that keeps all
nodes on and services jobs FIFO and assign that result to baseline_off_occupancy
(and corresponding baseline_off_* exports), or if you prefer to drop it, remove
all uses/exports/plots of baseline_off_occupancy and baseline_off_* so they no
longer appear in outputs/visualizations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d1476f2-2cd2-4c3f-b0f0-4229541b5237
📒 Files selected for processing (2)
analyze_arrivalscale_occupancy.pyanalyze_lambda_occupancy.py
| baseline_off_occupancy = baseline_occupancy.copy() | ||
| dropped_jobs_delta_total = dropped_jobs_agent_total - dropped_jobs_baseline_total |
There was a problem hiding this comment.
baseline_off_occupancy is still synthetic.
src/evaluation_summary.py only emits Baseline Occupancy (Nodes), so Line 299 fabricates a second occupancy series instead of parsing a real baseline-off value. The new baseline_off_* CSV columns and plots will therefore always mirror baseline and misstate the analysis. Either plumb a real baseline-off occupancy through the episode summary/parser or drop these fields.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze_arrivalscale_occupancy.py` around lines 299 - 300, The variable
baseline_off_occupancy is being fabricated by copying baseline_occupancy
(baseline_off_occupancy = baseline_occupancy.copy()) which yields synthetic
baseline-off series and causes the new baseline_off_* CSV columns/plots to
mirror baseline; either remove all code that generates/exports baseline_off_*
fields and plots, or properly plumb a real baseline-off occupancy into the
parser: update the episode-summary parsing logic (the code that reads "Baseline
Occupancy (Nodes)" from the episode/evaluation summary) to extract a distinct
baseline-off occupancy series and assign it to baseline_off_occupancy (and then
use that real series when computing dropped_jobs_delta_total and writing
CSV/plot outputs), ensuring you stop creating baseline_off_occupancy by copying
baseline_occupancy.
| def build_train_command(args: argparse.Namespace, job_arrival_scale: float, replay_mode: str) -> list[str]: | ||
| cmd = [ | ||
| sys.executable, | ||
| "./train.py", | ||
| "--prices", | ||
| args.prices, | ||
| "--session", | ||
| args.session, | ||
| "--efficiency-weight", | ||
| str(args.efficiency_weight), | ||
| "--price-weight", | ||
| str(args.price_weight), | ||
| "--idle-weight", | ||
| str(args.idle_weight), | ||
| "--job-age-weight", | ||
| str(args.job_age_weight), | ||
| "--drop-weight", | ||
| str(args.drop_weight), | ||
| "--evaluate-savings", | ||
| "--eval-months", | ||
| str(args.eval_months), | ||
| "--model", | ||
| str(args.model), | ||
| "--jobs", | ||
| args.jobs, | ||
| ] |
There was a problem hiding this comment.
Normalize forwarded file paths before spawning train.py.
Lines 401-425 pass raw args.prices and args.jobs through to train.py, but Line 463 runs the child from project_root. That means relative paths are interpreted differently in the analyzer vs. in the child process; args.jobs can even pass the Line 942-944 existence check and still point somewhere else once cwd changes. Resolve the forwarded paths against the same base directory before building the command.
Also applies to: 461-463, 942-944
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze_arrivalscale_occupancy.py` around lines 400 - 425, The forwarded file
paths args.prices and args.jobs are passed raw into build_train_command and can
become invalid once the child is spawned from project_root; update
build_train_command to normalize/resolve those paths against the same base
directory used for the child (e.g., resolve with pathlib.Path(project_root) /
args.prices or os.path.abspath relative to project_root) before inserting them
into the cmd list, and apply the same normalization wherever the script spawns
the child or performs the existence check so that path validation and the
command use the exact same resolved paths.
| agent_wait_summary, baseline_wait_summary = parse_wait_summary(combined_output) | ||
| if agent_wait_summary is None or baseline_wait_summary is None: | ||
| print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level wait summary; effective savings may be NaN.") | ||
| agent_avg_wait_hours = float(np.mean(avg_wait)) | ||
| baseline_avg_wait_hours = float(np.mean(avg_wait)) | ||
| else: | ||
| agent_avg_wait_hours = float(agent_wait_summary) | ||
| baseline_avg_wait_hours = float(baseline_wait_summary) |
There was a problem hiding this comment.
These fallback defaults hide parser failures.
Line 497 copies the agent wait mean into the baseline slot, and Line 511 assigns 0.0 to a missing baseline dropped total. That makes effective_savings* and dropped_jobs_delta_total look valid even when the summary parse failed. Prefer NaN or a hard failure for missing baseline aggregates.
Suggested adjustment
if agent_wait_summary is None or baseline_wait_summary is None:
print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level wait summary; effective savings may be NaN.")
agent_avg_wait_hours = float(np.mean(avg_wait))
- baseline_avg_wait_hours = float(np.mean(avg_wait))
+ baseline_avg_wait_hours = float("nan")
else:
agent_avg_wait_hours = float(agent_wait_summary)
baseline_avg_wait_hours = float(baseline_wait_summary)
@@
if dropped_jobs_baseline_total is None:
- dropped_jobs_baseline_total = 0.0
- print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level baseline dropped total; defaulting to 0.")
+ dropped_jobs_baseline_total = float("nan")
+ print(f"[warn] mode={replay_mode}, scale={job_arrival_scale:.6f}: could not parse run-level baseline dropped total; value set to NaN.")Also applies to: 506-512
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze_arrivalscale_occupancy.py` around lines 493 - 500, The current
fallback when parse_wait_summary fails masks parser errors by copying agent
metrics into baseline (agent_wait_summary -> baseline_avg_wait_hours) and
assigning zeros (e.g., dropped_jobs_delta_total), which produces misleading
"valid" deltas; in parse_wait_summary error paths (check of agent_wait_summary
and baseline_wait_summary) change the behavior to surface failure: set
agent_avg_wait_hours and baseline_avg_wait_hours to NaN (use float("nan")) or
raise an exception instead of copying values, and ensure any dropped-job totals
(e.g., dropped_jobs_delta_total) are set to NaN or propagate the error rather
than defaulting to 0.0; update any downstream logic that assumes numeric values
to handle NaN appropriately so parser failures are visible during analysis.
| dropped_jobs_agent_total, dropped_jobs_baseline_total = parse_dropped_totals_summary(combined_output) | ||
| if dropped_jobs_agent_total is None: | ||
| dropped_jobs_agent_total = float(np.sum(agent_dropped)) | ||
| print(f"[warn] lambda={lambda_value}: could not parse run-level agent dropped total; using sum of episode Dropped= values.") | ||
| if dropped_jobs_baseline_total is None: | ||
| dropped_jobs_baseline_total = 0.0 | ||
| print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; defaulting to 0.") |
There was a problem hiding this comment.
Don't turn an unparsed baseline dropped total into zero.
If Total Dropped Jobs (Baseline) stops matching the summary format, Line 462 makes dropped_jobs_delta_total look better than it is. Use NaN or fail fast so parser drift stays visible.
Suggested adjustment
if dropped_jobs_baseline_total is None:
- dropped_jobs_baseline_total = 0.0
- print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; defaulting to 0.")
+ dropped_jobs_baseline_total = float("nan")
+ print(f"[warn] lambda={lambda_value}: could not parse run-level baseline dropped total; value set to NaN.")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@analyze_lambda_occupancy.py` around lines 457 - 463, The code currently sets
dropped_jobs_baseline_total to 0.0 when parse_dropped_totals_summary returns
None, which hides parser failures; change that behavior in the block handling
dropped_jobs_baseline_total so that when parse_dropped_totals_summary(...)
returns None you assign numpy.nan (e.g. dropped_jobs_baseline_total = np.nan) or
raise an exception instead of 0.0, and keep the existing warning print using
lambda_value so downstream computations like dropped_jobs_delta_total will
surface NaNs rather than silently improving the delta; reference
parse_dropped_totals_summary, dropped_jobs_baseline_total,
dropped_jobs_agent_total, agent_dropped and lambda_value when making the change.
Dependent on Pr1 and Pr2