Conversation
|
Caution Review failedPull request was closed or merged during review Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthrough🚥 Pre-merge checks | ❌ 3❌ Failed checks (1 warning, 2 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
train.py (1)
137-151:⚠️ Potential issue | 🔴 CriticalRemove or reintroduce
carry_over_statebefore this ships.This call now passes
carry_over_state, butComputeClusterEnv.__init__insrc/environment.py(Line 63-77) does not accept that keyword. The result is an immediateTypeErrorduring startup, before the firstreset().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@train.py` around lines 137 - 151, The call to ComputeClusterEnv in train.py passes carry_over_state but ComputeClusterEnv.__init__ (in src/environment.py) does not accept that keyword, causing a TypeError; either remove the carry_over_state argument from the ComputeClusterEnv(...) call in train.py or add a carry_over_state parameter (with a default, store it on self and use it where env state persistence is handled) to ComputeClusterEnv.__init__ so the signature matches callers (update any internal logic that relies on preserving/restoring state to reference self.carry_over_state).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/reward_calculation.py`:
- Around line 50-53: The NEGATIVE_PRICE_OVERDRIVE_* configuration lets computed
price_reward exceed the normalized upper bound; change the logic so price_reward
is always clamped to 1.0 (or disable the above-one path) by updating the branch
that reads NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE and uses
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD: ensure any calculation that yields
price_reward uses min(computed_value, 1.0) (and remove or ignore
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD when allowing above-one is false), and apply
the same clamp/update for the duplicate logic referenced around the other block
(the second occurrence at lines ~164-168) so price_weight remains within the
configured normalized range.
In `@src/sampler_jobs.py`:
- Around line 356-362: The hourly-job branch currently copies raw
agg_job['nnodes'] and ['cores_per_node'] into hourly_job which can exceed
simulation limits; clamp those values to the configured max_nodes_per_job and
cores_per_node before assigning (e.g., replace agg_job['nnodes'] with
min(agg_job['nnodes'], max_nodes_per_job) and agg_job['cores_per_node'] with
min(agg_job['cores_per_node'], cores_per_node)) while keeping duration_hours,
original_job_count and instances as-is so the replayed requests fit the
simulated cluster.
In `@src/workload_generator.py`:
- Around line 140-167: The current fallback clones existing sampled jobs (using
new_jobs_durations/new_jobs_nodes/new_jobs_cores) when job_arrival_scale != 1.0,
which artificially reduces variance and corrupts correlations; instead either
apply job_arrival_scale at the arrival-count sampling stage or generate fresh
samples for any additional arrivals from the original source sampler. Remove the
cloning branch that uses whole/frac to extend scaled_durations/nodes/cores and
replace it with logic that (a) computes the desired scaled_count from
new_jobs_count and job_arrival_scale, and then (b) if scaled_count >
new_jobs_count, calls the same sampling routines used originally to produce
single job attributes (i.e., the source's duration/node/core samplers or the
function that produced new_jobs_durations/new_jobs_nodes/new_jobs_cores) to
create the extra jobs so all sampled tuples are independently drawn; keep
arrival_scale_applied_in_source handling and update
new_jobs_count/new_jobs_durations/new_jobs_nodes/new_jobs_cores accordingly.
- Around line 89-98: The code assumes jobs_sampler.sample_one_hourly(wrap=True)
always returns a dict with "hourly_jobs" but it can return {} causing a
KeyError; to fix, capture the return value from sample_one_hourly into a
variable (e.g., sample = jobs_sampler.sample_one_hourly(wrap=True)), check if
sample and "hourly_jobs" in sample (or use sample.get("hourly_jobs", [])) before
iterating, and only extend
new_jobs_count/new_jobs_durations/new_jobs_nodes/new_jobs_cores when there are
actual jobs from sample["hourly_jobs"]; reference sample_one_hourly,
jobs_sampler, and the "hourly_jobs" key and the new_jobs_* variables when
locating the code to change.
In `@train_iter.py`:
- Around line 242-247: The validation in train_iter.py should mirror train.py's
path normalization so whitespace-only --jobs values are rejected; import or copy
the same norm_path() logic used in train.py (see lines referenced around
norm_path() usage) and apply it to args.jobs before the checks for
args.jobs_exact_replay and args.jobs_exact_replay_aggregate, then use the
normalized value in the existing conditional checks (e.g., check normalized_jobs
rather than args.jobs) so blank/whitespace paths are treated as empty and
rejected consistently.
---
Outside diff comments:
In `@train.py`:
- Around line 137-151: The call to ComputeClusterEnv in train.py passes
carry_over_state but ComputeClusterEnv.__init__ (in src/environment.py) does not
accept that keyword, causing a TypeError; either remove the carry_over_state
argument from the ComputeClusterEnv(...) call in train.py or add a
carry_over_state parameter (with a default, store it on self and use it where
env state persistence is handled) to ComputeClusterEnv.__init__ so the signature
matches callers (update any internal logic that relies on preserving/restoring
state to reference self.carry_over_state).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 051899fe-651e-48ae-ac60-760fa2540328
📒 Files selected for processing (7)
src/environment.pysrc/reward_calculation.pysrc/sampler_hourly.pysrc/sampler_jobs.pysrc/workload_generator.pytrain.pytrain_iter.py
… replay/scaling modes - add core-hour-aware template semantics in sampler conversion: - sub-hour aggregated bins emit one hourly-equivalent template (`instances=1`) - hour+ bins preserve multiplicity via `instances=count` - update jobs replay to honor `instances` when expanding sampled templates - add global `job_arrival_scale` multiplier in job generation (supports fractional scaling with probabilistic rounding) - introduce exact timeline replay for `--jobs` via `--jobs-exact-replay` - add optional per-step aggregation in exact replay via `--jobs-exact-replay-aggregate` - ensure exact+aggregate path also uses `instances` (same conserving behavior as template replay) - wire new controls through `ComputeClusterEnv`, `train.py`, and `train_iter.py` - add CLI validation and startup diagnostics for replay mode and arrival scale - report `max_new_jobs_per_hour` based on expanded instance counts
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/workload_generator.py (1)
89-98:⚠️ Potential issue | 🟡 MinorGuard against empty return from
sample_one_hourly().
sample_one_hourly()returns{}when no periods are loaded, causing aKeyErroron the direct["hourly_jobs"]access.Suggested fix
else: # Use pre-aggregated hourly templates for pattern-based replay. - jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] - if len(jobs) > 0: + sampled = jobs_sampler.sample_one_hourly(wrap=True) + jobs = sampled.get("hourly_jobs", []) + if jobs: for job in jobs:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` around lines 89 - 98, Guard against sample_one_hourly() returning an empty dict by avoiding direct indexing into ["hourly_jobs"]; call jobs_sampler.sample_one_hourly(wrap=True) into a variable (e.g., sample = jobs_sampler.sample_one_hourly(...)) and then read sample.get("hourly_jobs", []) or check "if sample and 'hourly_jobs' in sample" before iterating; update the block that currently uses jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] so it safely handles {} and proceeds only when hourly_jobs is a non-empty list (affecting jobs_sampler.sample_one_hourly, jobs variable, and the subsequent for loop).
🧹 Nitpick comments (3)
src/workload_generator.py (2)
158-158: Consider addingstrict=Truetozip()for defensive coding.While the lists are constructed together and should always have the same length, adding
strict=Truewould catch any future bugs where they diverge.Suggested fix
- for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores): + for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` at line 158, The loop over new_jobs_durations, new_jobs_nodes, new_jobs_cores uses zip(...) without strict checking; update the iteration in the function that contains the line with for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores): to call zip(..., strict=True) so Python raises if the three lists ever differ in length (defensive coding). Ensure the runtime environment is Python 3.10+ or guard accordingly if compatibility is required.
140-167: Arrival scaling via job cloning alters workload variance.The global fallback scales by duplicating existing sampled jobs. This preserves the expected count but changes the variance and correlation structure of the workload—identical jobs are cloned rather than independently sampled. For synthetic workloads (
workload_genor legacy randomizer), consider applying scaling at the arrival-count sampling stage or freshly sampling additional jobs.This may be acceptable for exact-replay paths but is worth noting for statistical sampling sources.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` around lines 140 - 167, The current global fallback (when arrival_scale_applied_in_source is False) clones sampled jobs by repeating new_jobs_durations/new_jobs_nodes/new_jobs_cores which changes variance and correlations; instead, for synthetic/legacy sources (e.g., workload_gen/randomizer) apply scaling at the arrival-count sampling step or freshly sample additional independent jobs rather than duplicating entries: detect when job_arrival_scale > 1.0 and, for the fractional/whole parts, either increase the sampled arrival count before invoking the sampler or call the sampler anew to generate independent extra jobs, using the same seeding/prng (np_random) to preserve reproducibility; update the code paths around job_arrival_scale, new_jobs_count, and the arrays new_jobs_durations/new_jobs_nodes/new_jobs_cores so they contain independently sampled jobs rather than clones.src/sampler_hourly.py (1)
316-316: Remove redundantint()cast.
math.floor()already returns anintin Python 3, making the outerint()call unnecessary.Suggested fix
- whole = int(math.floor(arrival_scale)) + whole = math.floor(arrival_scale)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampler_hourly.py` at line 316, Remove the redundant outer int() cast when computing the integer part of arrival_scale; replace the expression that assigns to the variable whole (currently using int(math.floor(arrival_scale))) with a single call to math.floor(arrival_scale) so you rely on Python 3's floor returning an int (look for the assignment to whole in src/sampler_hourly.py where arrival_scale is used).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@train.py`:
- Line 148: The call to ComputeClusterEnv (or where carry_over_state is passed)
references args.carry_over_state which doesn't exist and ComputeClusterEnv has
no such parameter; remove the carry_over_state=args.carry_over_state argument
from the call site (the line containing carry_over_state=args.carry_over_state)
and ensure no other references to args.carry_over_state remain (or if
persistence behavior is required, add a properly defined CLI flag and a matching
parameter to ComputeClusterEnv such as carry_over_state and wire them up
consistently).
---
Duplicate comments:
In `@src/workload_generator.py`:
- Around line 89-98: Guard against sample_one_hourly() returning an empty dict
by avoiding direct indexing into ["hourly_jobs"]; call
jobs_sampler.sample_one_hourly(wrap=True) into a variable (e.g., sample =
jobs_sampler.sample_one_hourly(...)) and then read sample.get("hourly_jobs", [])
or check "if sample and 'hourly_jobs' in sample" before iterating; update the
block that currently uses jobs =
jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] so it safely handles {}
and proceeds only when hourly_jobs is a non-empty list (affecting
jobs_sampler.sample_one_hourly, jobs variable, and the subsequent for loop).
---
Nitpick comments:
In `@src/sampler_hourly.py`:
- Line 316: Remove the redundant outer int() cast when computing the integer
part of arrival_scale; replace the expression that assigns to the variable whole
(currently using int(math.floor(arrival_scale))) with a single call to
math.floor(arrival_scale) so you rely on Python 3's floor returning an int (look
for the assignment to whole in src/sampler_hourly.py where arrival_scale is
used).
In `@src/workload_generator.py`:
- Line 158: The loop over new_jobs_durations, new_jobs_nodes, new_jobs_cores
uses zip(...) without strict checking; update the iteration in the function that
contains the line with for d, n, c in zip(new_jobs_durations, new_jobs_nodes,
new_jobs_cores): to call zip(..., strict=True) so Python raises if the three
lists ever differ in length (defensive coding). Ensure the runtime environment
is Python 3.10+ or guard accordingly if compatibility is required.
- Around line 140-167: The current global fallback (when
arrival_scale_applied_in_source is False) clones sampled jobs by repeating
new_jobs_durations/new_jobs_nodes/new_jobs_cores which changes variance and
correlations; instead, for synthetic/legacy sources (e.g.,
workload_gen/randomizer) apply scaling at the arrival-count sampling step or
freshly sample additional independent jobs rather than duplicating entries:
detect when job_arrival_scale > 1.0 and, for the fractional/whole parts, either
increase the sampled arrival count before invoking the sampler or call the
sampler anew to generate independent extra jobs, using the same seeding/prng
(np_random) to preserve reproducibility; update the code paths around
job_arrival_scale, new_jobs_count, and the arrays
new_jobs_durations/new_jobs_nodes/new_jobs_cores so they contain independently
sampled jobs rather than clones.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0d032729-9180-40ae-ad55-c1117a9261ba
📒 Files selected for processing (6)
src/environment.pysrc/sampler_hourly.pysrc/sampler_jobs.pysrc/workload_generator.pytrain.pytrain_iter.py
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/workload_generator.py (2)
90-91:⚠️ Potential issue | 🟡 MinorGuard empty hourly samples before subscripting.
sample_one_hourly()can return{}when no periods are loaded, so Line 91 raisesKeyErrorinstead of cleanly producing no arrivals.Suggested fix
- jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] + sampled = jobs_sampler.sample_one_hourly(wrap=True) + jobs = sampled.get("hourly_jobs", [])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` around lines 90 - 91, sample_one_hourly() can return an empty dict so subscripting ["hourly_jobs"] raises KeyError; update the code where jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] to first capture the result (e.g., hourly = jobs_sampler.sample_one_hourly(wrap=True)) and guard it (if not hourly or "hourly_jobs" not in hourly) then set jobs to an empty list or skip processing; reference sample_one_hourly, jobs_sampler, and the "hourly_jobs" key when making the change.
140-167:⚠️ Potential issue | 🟠 MajorDon’t scale the fallback randomizer by cloning sampled jobs.
This block repeats already-sampled
(duration, nodes, cores)tuples.train_iter.pyLine 248-254 only disables scaling for--workload-gen, so the legacy no-source path still reaches this code and loses the independent sampling behavior of arrivals vs. job attributes. Either resample extra jobs from the original source or reject--job-arrival-scalefor unsupported sources.Based on learnings, workload generator configuration uses four core attributes: arrivals (jobs/hour), duration (hours), nodes, and cores-per-node, each independently sampled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` around lines 140 - 167, The current scaling logic clones already-sampled (duration,nodes,cores) tuples (using scaled_durations.extend(new_jobs_durations * whole) etc.), which breaks independent sampling of arrivals vs. job attributes; instead, when arrival_scale_applied_in_source is False and job_arrival_scale != 1.0, resample attributes for each additional arrival from the original source/distribution rather than duplicating the existing lists (i.e., replace the extend/append-by-cloning strategy for whole and frac with calls to the source sampling routine used elsewhere to draw durations, nodes, and cores independently for each scaled job), or if the source cannot support resampling, explicitly reject/raise an error when --job-arrival-scale is used for that source (make the check use arrival_scale_applied_in_source and the CLI flag --job-arrival-scale to refuse unsupported combinations).train_iter.py (1)
248-254:⚠️ Potential issue | 🟠 MajorFail fast instead of silently rewriting
--job-arrival-scale.Resetting the requested scale to
1.0can burn an entire sweep on settings different from what the caller asked for.parser.error(...)is safer here and makes the unsupported combination explicit before any child processes start.Suggested fix
- if args.workload_gen and args.job_arrival_scale != 1.0: - print( - "Warning: --job-arrival-scale is not allowed with --workload-gen; " - "resetting it to 1.0. Use workload generator arrival settings instead.", - file=sys.stderr, - ) - args.job_arrival_scale = 1.0 + if args.workload_gen and args.job_arrival_scale != 1.0: + parser.error("--job-arrival-scale is not supported with --workload-gen")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@train_iter.py` around lines 248 - 254, Replace the silent reset of args.job_arrival_scale with a fail-fast parser error: when args.workload_gen is true and args.job_arrival_scale != 1.0, call parser.error("...") with a clear message that --job-arrival-scale is incompatible with --workload-gen (mention to use workload generator arrival settings) instead of setting args.job_arrival_scale = 1.0; update the code around the existing check (which references args.workload_gen and args.job_arrival_scale) to raise parser.error so the program exits immediately with an explicit message.src/sampler_jobs.py (1)
355-357:⚠️ Potential issue | 🟠 MajorClamp replayed resources on the lower bound too.
This branch now caps oversized values, but it can still emit
0nodes or0cores_per_nodefrom malformed traces becauseparse_jobs()stores0whennnodes == 0orncpus < nnodes. That creates impossible jobs for the simulator; mirror the raw replay path and usemax(1, min(...))here.Suggested fix
- nnodes = min(agg_job['nnodes'], max_nodes_per_job) - cores_per_node_needed = min(agg_job['cores_per_node'], cores_per_node) + nnodes = max(1, min(agg_job['nnodes'], max_nodes_per_job)) + cores_per_node_needed = max(1, min(agg_job['cores_per_node'], cores_per_node))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampler_jobs.py` around lines 355 - 357, The clamping currently only caps oversized values and can produce zero-sized jobs; update the calculations for nnodes and cores_per_node_needed to enforce a lower bound of 1 by wrapping the existing min(...) with max(1, ...). Specifically modify the usage of agg_job['nnodes'] and agg_job['cores_per_node'] so nnodes = max(1, min(agg_job['nnodes'], max_nodes_per_job)) and cores_per_node_needed = max(1, min(agg_job['cores_per_node'], cores_per_node)) to match the raw replay behavior and avoid zero-valued jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/sampler_jobs.py`:
- Around line 355-357: The clamping currently only caps oversized values and can
produce zero-sized jobs; update the calculations for nnodes and
cores_per_node_needed to enforce a lower bound of 1 by wrapping the existing
min(...) with max(1, ...). Specifically modify the usage of agg_job['nnodes']
and agg_job['cores_per_node'] so nnodes = max(1, min(agg_job['nnodes'],
max_nodes_per_job)) and cores_per_node_needed = max(1,
min(agg_job['cores_per_node'], cores_per_node)) to match the raw replay behavior
and avoid zero-valued jobs.
In `@src/workload_generator.py`:
- Around line 90-91: sample_one_hourly() can return an empty dict so
subscripting ["hourly_jobs"] raises KeyError; update the code where jobs =
jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] to first capture the
result (e.g., hourly = jobs_sampler.sample_one_hourly(wrap=True)) and guard it
(if not hourly or "hourly_jobs" not in hourly) then set jobs to an empty list or
skip processing; reference sample_one_hourly, jobs_sampler, and the
"hourly_jobs" key when making the change.
- Around line 140-167: The current scaling logic clones already-sampled
(duration,nodes,cores) tuples (using scaled_durations.extend(new_jobs_durations
* whole) etc.), which breaks independent sampling of arrivals vs. job
attributes; instead, when arrival_scale_applied_in_source is False and
job_arrival_scale != 1.0, resample attributes for each additional arrival from
the original source/distribution rather than duplicating the existing lists
(i.e., replace the extend/append-by-cloning strategy for whole and frac with
calls to the source sampling routine used elsewhere to draw durations, nodes,
and cores independently for each scaled job), or if the source cannot support
resampling, explicitly reject/raise an error when --job-arrival-scale is used
for that source (make the check use arrival_scale_applied_in_source and the CLI
flag --job-arrival-scale to refuse unsupported combinations).
In `@train_iter.py`:
- Around line 248-254: Replace the silent reset of args.job_arrival_scale with a
fail-fast parser error: when args.workload_gen is true and
args.job_arrival_scale != 1.0, call parser.error("...") with a clear message
that --job-arrival-scale is incompatible with --workload-gen (mention to use
workload generator arrival settings) instead of setting args.job_arrival_scale =
1.0; update the code around the existing check (which references
args.workload_gen and args.job_arrival_scale) to raise parser.error so the
program exits immediately with an explicit message.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5bbfa476-dca5-4fb6-a504-88867c55c17e
📒 Files selected for processing (5)
src/sampler_hourly.pysrc/sampler_jobs.pysrc/workload_generator.pytrain.pytrain_iter.py
🚧 Files skipped from review as they are similar to previous changes (1)
- train.py
There was a problem hiding this comment.
♻️ Duplicate comments (2)
train_iter.py (1)
244-245:⚠️ Potential issue | 🟡 MinorWhitespace-only
--jobsvalues still pass validation.The past review comment about path normalization remains unaddressed. A whitespace-only
--jobsvalue (e.g.,--jobs " ") is truthy in Python, so the checknot args.jobswon't catch it. The sweep can then spawn child processes that fail immediately intrain.py.Suggested fix
- if args.jobs_exact_replay and not args.jobs: + if args.jobs_exact_replay and not (args.jobs and args.jobs.strip()): parser.error("--jobs-exact-replay requires --jobs")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@train_iter.py` around lines 244 - 245, The current check uses if args.jobs_exact_replay and not args.jobs which misses whitespace-only values; update the validation for args.jobs (when args.jobs_exact_replay is set) to normalize and trim the value (call .strip() and optionally os.path.normpath) and then check emptiness, e.g. treat a jobs value that is "" after stripping as invalid and call parser.error("--jobs-exact-replay requires a non-empty --jobs"); reference args.jobs_exact_replay, args.jobs, and parser.error to locate and change the check so child processes in train.py are not spawned with an empty/whitespace jobs path.src/workload_generator.py (1)
89-98:⚠️ Potential issue | 🟡 MinorGuard against empty hourly replay sample (unaddressed from prior review).
sample_one_hourly()returns{}when no periods are loaded, so direct["hourly_jobs"]subscript raisesKeyError. The past review comment was marked as addressed but the code still uses direct subscript access.Suggested fix
else: # Use pre-aggregated hourly templates for pattern-based replay. - jobs = jobs_sampler.sample_one_hourly(wrap=True)["hourly_jobs"] - if len(jobs) > 0: + sampled = jobs_sampler.sample_one_hourly(wrap=True) + jobs = sampled.get("hourly_jobs", []) + if jobs: for job in jobs:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` around lines 89 - 98, The code assumes jobs_sampler.sample_one_hourly() always returns a dict with "hourly_jobs" and will KeyError when it returns {}, so update the block in workload_generator.py to safely handle empty responses from sample_one_hourly(): call jobs_sampler.sample_one_hourly() into a variable (e.g., sample = jobs_sampler.sample_one_hourly(wrap=True)), then extract jobs = sample.get("hourly_jobs", []) or guard with if sample and "hourly_jobs" in sample before iterating; keep the existing loop that expands instances into new_jobs_count/new_jobs_durations/new_jobs_nodes/new_jobs_cores unchanged.
🧹 Nitpick comments (3)
src/workload_generator.py (2)
158-158: Addstrict=Truetozip()for defensive programming.While the lists are constructed in parallel and should always have the same length, adding
strict=Truewould catch any future bugs that cause length mismatches.Suggested fix
- for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores): + for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` at line 158, The loop using zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores) should use strict=True to catch any length mismatches defensively; update the loop in the code that iterates with "for d, n, c in zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores)" to call zip with strict=True so Python will raise a ValueError if the lists differ in length, keeping the unpacking of d, n, c and subsequent logic unchanged.
82-82: Redundant outerint()onmath.ceil()result.
math.ceil()returns anintin Python 3, so the outerint()call is unnecessary. The innerint(job['duration_minutes'])is appropriate for defensive parsing.Suggested fix
- duration_hours = max(1, int(math.ceil(int(job['duration_minutes']) / 60))) + duration_hours = max(1, math.ceil(int(job['duration_minutes']) / 60))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/workload_generator.py` at line 82, The expression computing duration_hours redundantly wraps math.ceil(...) with int(); update the calculation so it uses duration_minutes parsed defensively (keep int(job['duration_minutes'])) and assign duration_hours = max(1, math.ceil(int(job['duration_minutes']) / 60)) (i.e., remove the outer int call around math.ceil) so duration_hours remains an int without the redundant conversion; modify the assignment where duration_hours is defined in workload_generator.py to reference math.ceil and job['duration_minutes'] as described.src/sampler_hourly.py (1)
316-316: Remove redundantint()cast.
math.floor()returns anintin Python 3, so the outerint()call is unnecessary.Suggested fix
- whole = int(math.floor(arrival_scale)) + whole = math.floor(arrival_scale)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/sampler_hourly.py` at line 316, The assignment to variable `whole` wraps math.floor with an unnecessary int() call; replace `whole = int(math.floor(arrival_scale))` with a direct call to `math.floor(arrival_scale)` (e.g., `whole = math.floor(arrival_scale)`) in the function/block where `whole` is defined so the value remains an int without the redundant cast; verify any type hints or downstream code expecting an int still work (no further changes needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/workload_generator.py`:
- Around line 89-98: The code assumes jobs_sampler.sample_one_hourly() always
returns a dict with "hourly_jobs" and will KeyError when it returns {}, so
update the block in workload_generator.py to safely handle empty responses from
sample_one_hourly(): call jobs_sampler.sample_one_hourly() into a variable
(e.g., sample = jobs_sampler.sample_one_hourly(wrap=True)), then extract jobs =
sample.get("hourly_jobs", []) or guard with if sample and "hourly_jobs" in
sample before iterating; keep the existing loop that expands instances into
new_jobs_count/new_jobs_durations/new_jobs_nodes/new_jobs_cores unchanged.
In `@train_iter.py`:
- Around line 244-245: The current check uses if args.jobs_exact_replay and not
args.jobs which misses whitespace-only values; update the validation for
args.jobs (when args.jobs_exact_replay is set) to normalize and trim the value
(call .strip() and optionally os.path.normpath) and then check emptiness, e.g.
treat a jobs value that is "" after stripping as invalid and call
parser.error("--jobs-exact-replay requires a non-empty --jobs"); reference
args.jobs_exact_replay, args.jobs, and parser.error to locate and change the
check so child processes in train.py are not spawned with an empty/whitespace
jobs path.
---
Nitpick comments:
In `@src/sampler_hourly.py`:
- Line 316: The assignment to variable `whole` wraps math.floor with an
unnecessary int() call; replace `whole = int(math.floor(arrival_scale))` with a
direct call to `math.floor(arrival_scale)` (e.g., `whole =
math.floor(arrival_scale)`) in the function/block where `whole` is defined so
the value remains an int without the redundant cast; verify any type hints or
downstream code expecting an int still work (no further changes needed).
In `@src/workload_generator.py`:
- Line 158: The loop using zip(new_jobs_durations, new_jobs_nodes,
new_jobs_cores) should use strict=True to catch any length mismatches
defensively; update the loop in the code that iterates with "for d, n, c in
zip(new_jobs_durations, new_jobs_nodes, new_jobs_cores)" to call zip with
strict=True so Python will raise a ValueError if the lists differ in length,
keeping the unpacking of d, n, c and subsequent logic unchanged.
- Line 82: The expression computing duration_hours redundantly wraps
math.ceil(...) with int(); update the calculation so it uses duration_minutes
parsed defensively (keep int(job['duration_minutes'])) and assign duration_hours
= max(1, math.ceil(int(job['duration_minutes']) / 60)) (i.e., remove the outer
int call around math.ceil) so duration_hours remains an int without the
redundant conversion; modify the assignment where duration_hours is defined in
workload_generator.py to reference math.ceil and job['duration_minutes'] as
described.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 26a78fcb-32ec-4b61-8b61-0364ff0f374d
📒 Files selected for processing (5)
src/sampler_hourly.pysrc/sampler_jobs.pysrc/workload_generator.pytrain.pytrain_iter.py
🚧 Files skipped from review as they are similar to previous changes (1)
- train.py
… mode Apply `--job-arrival-scale` to the `--hourly-jobs` path so true statistical sampling can be up/down-scaled directly at sampled arrival-count level. Key changes: - `HourlySampler.sample_aggregated(..., arrival_scale=...)` now scales sampled counts - `generate_jobs()` forwards `job_arrival_scale` to hourly sampler - avoid double-scaling by skipping global post-scaling when hourly path already applied it - sampling mode: `--hourly-jobs`
Dependent on Pr1