deduplicate exact jobs replay (aggregated) by rbx · Pull Request #45 · FairRootGroup/powersched

rbx · 2026-03-25T13:09:16Z

No description provided.

coderabbitai · 2026-03-25T13:11:42Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Removes the jobs_exact_replay_aggregate option and associated aggregation/expansion path; exact replay now maps each raw job one-to-one. Adds a new test module exercising raw/template replay modes and updates CI to run it. Also removes the CLI flag and plumbing for the aggregation option.

Changes

Cohort / File(s)	Summary
CI / Tests `.github/workflows/tests.yml`, `test/run_all.py`, `test/test_jobs_replay_modes.py`	Adds a CI step to run `python -m test.test_jobs_replay_modes`, adds a new test module that validates raw vs template job-replay behavior, and registers it in the test runner.
Core job replay logic `src/environment.py`, `src/workload_generator.py`	Removes `jobs_exact_replay_aggregate` from `ComputeClusterEnv.__init__` and from `generate_jobs` signature; deletes the aggregation/convert-and-expand branch so exact replay iterates over raw jobs and emits one bounded hourly job per raw entry.
CLI / Launch helpers `train.py`, `train_iter.py`	Removes `--jobs-exact-replay-aggregate` CLI flag and all validation/forwarding/parameter plumbing that propagated the flag into environment and parallel launch helpers.
Misc (tests runner) `test/run_all.py`	Inserted `python -m test.test_jobs_replay_modes` into the sequential TESTS list so its return code contributes to overall test summary.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Hourly sampler: aggregate smaller jobs #19: Directly related to job-aggregation logic and workload sampling changes touched in workload_generator.py and environment.py.
Pr2 job gen #32: Modifies the same jobs_exact_replay_aggregate wiring and signatures removed here.
refactor: plotter config #10: Changes ComputeClusterEnv/plot config usage and touches the same initialization/signature surface as this PR.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.65% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	No pull request description was provided by the author, making it impossible to evaluate relevance to the changeset.	Add a description explaining why the jobs_exact_replay_aggregate parameter was removed and how this simplifies the exact job replay functionality.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'deduplicate exact jobs replay (aggregated)' directly relates to the main change: removing the jobs_exact_replay_aggregate parameter and simplifying exact job replay behavior across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

test/test_jobs_replay_modes.py (2)

76-89: Consider cleaning up temporary files.

Each test function creates a temporary file with delete=False but never removes it. While this works for CI (ephemeral environments), it leaves artifacts on local runs.

♻️ Optional: Add cleanup using try/finally or context manager

+import os
+
 def test_raw_replay_job_counts():
     with tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) as f:
         f.write(LOG_CONTENT)
         path = f.name
 
-    sampler = _make_sampler(path, template=False)
-
-    count1, durations1, nodes1, cores1 = _call_generate(sampler, hour=1, exact_replay=True)
-    assert count1 == 3, f"bin 1 raw: expected 3 jobs, got {count1}"
-
-    count2, durations2, nodes2, cores2 = _call_generate(sampler, hour=2, exact_replay=True)
-    assert count2 == 2, f"bin 2 raw: expected 2 jobs, got {count2}"
-
-    print(f"PASS: raw replay job counts ({count1}, {count2})")
+    try:
+        sampler = _make_sampler(path, template=False)
+
+        count1, _, _, _ = _call_generate(sampler, hour=1, exact_replay=True)
+        assert count1 == 3, f"bin 1 raw: expected 3 jobs, got {count1}"
+
+        count2, _, _, _ = _call_generate(sampler, hour=2, exact_replay=True)
+        assert count2 == 2, f"bin 2 raw: expected 2 jobs, got {count2}"
+
+        print(f"PASS: raw replay job counts ({count1}, {count2})")
+    finally:
+        os.unlink(path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_jobs_replay_modes.py` around lines 76 - 89, The
test_raw_replay_job_counts test creates a temp file with
tempfile.NamedTemporaryFile(delete=False) but never removes it; modify the test
to ensure the temporary file is cleaned up by wrapping the use in a try/finally
(or switch to delete=True and use the file inside the with-block) and call
os.remove(path) in the finally block if the file still exists; locate the
tempfile usage in test_raw_replay_job_counts (references:
tempfile.NamedTemporaryFile, LOG_CONTENT, _make_sampler, _call_generate) and add
the cleanup so local runs do not leave artifacts.

83-88: Unused variables flagged by static analysis.

The unpacked variables durations1, nodes1, cores1, etc. are not used in this test. Using underscores for unused values improves clarity.

♻️ Optional: Prefix unused variables with underscore

-    count1, durations1, nodes1, cores1 = _call_generate(sampler, hour=1, exact_replay=True)
+    count1, _durations1, _nodes1, _cores1 = _call_generate(sampler, hour=1, exact_replay=True)
     assert count1 == 3, f"bin 1 raw: expected 3 jobs, got {count1}"
 
-    count2, durations2, nodes2, cores2 = _call_generate(sampler, hour=2, exact_replay=True)
+    count2, _durations2, _nodes2, _cores2 = _call_generate(sampler, hour=2, exact_replay=True)
     assert count2 == 2, f"bin 2 raw: expected 2 jobs, got {count2}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/test_jobs_replay_modes.py` around lines 83 - 88, Test unpacks from
_call_generate into variables (durations1, nodes1, cores1, durations2, nodes2,
cores2) that are never used; change those unused names to start with an
underscore (e.g., _durations1, _nodes1, _cores1, _durations2, _nodes2, _cores2)
or use single underscores to indicate intentionally unused values when calling
_call_generate so static analysis warnings are removed while leaving the
assertions on count1 and count2 unchanged.

src/workload_generator.py (1)

67-74: Simplified raw exact replay logic looks correct.

The implementation correctly:

Converts duration from minutes to hours with ceiling (minimum 1 hour)

Clamps nnodes and cores_per_node to configured bounds

One minor cleanup from static analysis: the inner int() cast on line 68 is redundant since job['duration_minutes'] is already an integer from parse_jobs().
♻️ Optional: Remove redundant int() cast
 for job in raw_jobs:
-    duration_hours = max(1, int(math.ceil(int(job['duration_minutes']) / 60)))
+    duration_hours = max(1, int(math.ceil(job['duration_minutes'] / 60)))
     nnodes = min(max(int(job['nnodes']), MIN_NODES_PER_JOB), MAX_NODES_PER_JOB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/workload_generator.py` around lines 67 - 74, The duration conversion uses
an unnecessary int() cast around job['duration_minutes']; update the calculation
of duration_hours to remove the redundant int() so it reads duration_hours =
max(1, int(math.ceil(job['duration_minutes'] / 60))). Locate the loop iterating
over raw_jobs (variables raw_jobs, duration_hours, job['duration_minutes']) and
adjust that expression; no other logic changes needed and keep parse_jobs() as
the producer of integer minutes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/workload_generator.py`:
- Around line 67-74: The duration conversion uses an unnecessary int() cast
around job['duration_minutes']; update the calculation of duration_hours to
remove the redundant int() so it reads duration_hours = max(1,
int(math.ceil(job['duration_minutes'] / 60))). Locate the loop iterating over
raw_jobs (variables raw_jobs, duration_hours, job['duration_minutes']) and
adjust that expression; no other logic changes needed and keep parse_jobs() as
the producer of integer minutes.

In `@test/test_jobs_replay_modes.py`:
- Around line 76-89: The test_raw_replay_job_counts test creates a temp file
with tempfile.NamedTemporaryFile(delete=False) but never removes it; modify the
test to ensure the temporary file is cleaned up by wrapping the use in a
try/finally (or switch to delete=True and use the file inside the with-block)
and call os.remove(path) in the finally block if the file still exists; locate
the tempfile usage in test_raw_replay_job_counts (references:
tempfile.NamedTemporaryFile, LOG_CONTENT, _make_sampler, _call_generate) and add
the cleanup so local runs do not leave artifacts.
- Around line 83-88: Test unpacks from _call_generate into variables
(durations1, nodes1, cores1, durations2, nodes2, cores2) that are never used;
change those unused names to start with an underscore (e.g., _durations1,
_nodes1, _cores1, _durations2, _nodes2, _cores2) or use single underscores to
indicate intentionally unused values when calling _call_generate so static
analysis warnings are removed while leaving the assertions on count1 and count2
unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f328ffb9-320b-42dd-bb45-51c3544b0bd7

📥 Commits

Reviewing files that changed from the base of the PR and between 8225f99 and 01f3157.

📒 Files selected for processing (7)

.github/workflows/tests.yml
src/environment.py
src/workload_generator.py
test/run_all.py
test/test_jobs_replay_modes.py
train.py
train_iter.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/workload_generator.py`:
- Line 68: The duration_hours calculation uses an unnecessary int() around
math.ceil; update the expression in workload_generator.py (the duration_hours
assignment) to remove the redundant outer int() so it becomes max(1,
math.ceil(int(job['duration_minutes']) / 60)); keep the inner
int(job['duration_minutes']) conversion if needed to handle string inputs.

In `@test/test_jobs_replay_modes.py`:
- Around line 77-80: The tests create temporary files with
tempfile.NamedTemporaryFile(mode="w", suffix=".log", delete=False) and bind the
filename to path but never remove them; update each test that follows this
pattern (the instances that set path and write LOG_CONTENT) to unlink the file
after the test (e.g., import os and call os.unlink(path) in a finally block or
use pytest tmp_path/TemporaryDirectory), or revert to using delete=True/using
the file object directly so the file is removed automatically; ensure cleanup is
added for the occurrences that set path (the blocks around the
NamedTemporaryFile usages at the noted locations) so no temp files remain on
disk.
- Line 100: The tuple unpacking from _call_generate currently assigns unused
variables (e.g., count, durations, nodes, cores) causing RUF059; update those
call sites so unused elements are replaced with _ placeholders (for example use
count, _, _, _ = _call_generate(...) or _, durations, _, _ = _call_generate(...)
depending on which values are actually used) — locate every call to
_call_generate where not all four returned values are consumed (including the
occurrences noted around the replay-mode tests) and adjust the left-hand tuple
to use _ for any ignored positions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fc9a1b4b-1c2c-4ed0-ae5b-ffe7652f4221

📥 Commits

Reviewing files that changed from the base of the PR and between 01f3157 and 74e9937.

📒 Files selected for processing (7)

.github/workflows/tests.yml
src/environment.py
src/workload_generator.py
test/run_all.py
test/test_jobs_replay_modes.py
train.py
train_iter.py

✅ Files skipped from review due to trivial changes (1)

test/run_all.py

🚧 Files skipped from review as they are similar to previous changes (3)

.github/workflows/tests.yml
src/environment.py
train.py

src/workload_generator.py

test/test_jobs_replay_modes.py

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

rbx force-pushed the deduplicate-sampler branch from 01f3157 to 74e9937 Compare March 25, 2026 13:20

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

src/workload_generator.py Show resolved Hide resolved

test/test_jobs_replay_modes.py Show resolved Hide resolved

test/test_jobs_replay_modes.py Outdated Show resolved Hide resolved

deduplicate exact jobs replay (aggregated)

a6e297d

rbx force-pushed the deduplicate-sampler branch from 74e9937 to a6e297d Compare March 25, 2026 13:28

train.py: default reward plotting on the bi-weekly graph to False

c6c3c7c

rbx merged commit 8b05b7b into master Mar 26, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deduplicate exact jobs replay (aggregated)#45

deduplicate exact jobs replay (aggregated)#45
rbx merged 2 commits intomasterfrom
deduplicate-sampler

rbx commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rbx commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 25, 2026 •

edited

Loading