Conversation
-Now both price and efficiency rewards use "num_used_nodes" as a proxy for work done. Allow for more direct comparisson; efficiency=work quality, price=price aware work perfomance. -Simplify function and tweak constants to allow stronger responsiveness.
-Price overdrive now bounded in [0,1] -Efficiency signal shifted to be between [-1,1]. At 50% efficiency reward is 0.
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/reward_calculation.py (1)
36-53: Clarify implications of rewards exceeding 1.0.The uncapped mode (
NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE = True) allows_reward_priceto return values up to 1.5. This diverges from the normalized[-1, 1]range used by other reward components (e.g., efficiency, idle penalty). Downstream consumers or weighted aggregation logic may assume bounded[-1, 1]inputs.Consider documenting whether this asymmetry is intentional and how it interacts with the weighted sum in
calculate().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/reward_calculation.py` around lines 36 - 53, The uncapped overdrive flag NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE lets _reward_price produce values up to NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD (1.5), which breaks the implicit [-1,1] normalization expected by other components and calculate(); either document this behavior clearly and its intended interaction with the weighted aggregation (mention NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE, NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, _reward_price and calculate), or change calculate() to explicitly handle the asymmetry (e.g., clamp/scale _reward_price to [-1,1] before summing or adjust weights/normalization to accept >1) and add a short inline comment explaining why the chosen approach preserves expected reward semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/reward_calculation.py`:
- Around line 36-53: The uncapped overdrive flag
NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE lets _reward_price produce values up to
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD (1.5), which breaks the implicit [-1,1]
normalization expected by other components and calculate(); either document this
behavior clearly and its intended interaction with the weighted aggregation
(mention NEGATIVE_PRICE_OVERDRIVE_ALLOW_ABOVE_ONE,
NEGATIVE_PRICE_OVERDRIVE_MAX_REWARD, _reward_price and calculate), or change
calculate() to explicitly handle the asymmetry (e.g., clamp/scale _reward_price
to [-1,1] before summing or adjust weights/normalization to accept >1) and add a
short inline comment explaining why the chosen approach preserves expected
reward semantics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07763696-03b2-4003-a191-0392a5c7d46b
📒 Files selected for processing (1)
src/reward_calculation.py
|
Looks good overall, but I would address CodeRabbit's nitpick comment to document the intentional reward value range that can exceed 1.5 in that case. Just a short comment in the code that it is indeed intentional (if it is). |
There was a problem hiding this comment.
I mean... There is a comment just above
No description provided.