Adding algorithm for GCP CloudRun WorkerPools and similar compute providers#59
Open
02strich wants to merge 3 commits into
Open
Adding algorithm for GCP CloudRun WorkerPools and similar compute providers#5902strich wants to merge 3 commits into
02strich wants to merge 3 commits into
Conversation
Skip waiting on the long-running operation returned by UpdateWorkerPool so the activity returns more quickly. Whether the workers connect correctly wouldn’t be verified by it completing anyway.
Swap the silent noopLogger for a fallbackLogger that writes to stderr via the standard library so logs from outside an activity context (panic) or from interceptors returning nil are still visible. Add tests covering keyvals formatting, level routing, normalizeSDKLogger branches, and the non-activity-context fallback path.
Introduce a rate-based scaling algorithm that sizes the worker pool from EWMA-smoothed arrival, dispatch, and per-consumer capacity estimates. Desired worker count is the maximum of a utilization-target model and a Halfin-Whitt square-root staffing model, bounded by min/max count, capped per scale-up step, and gated by asymmetric scale-up/scale-down cooldowns plus a no-sync-match quiet period. Includes unit tests covering the sizing and scaling decisions. Also reject Inf and NaN values in validateFloat64InMap so float config keys must be real numbers.
gcristea-temporal
approved these changes
Jun 18, 2026
gcristea-temporal
left a comment
Contributor
There was a problem hiding this comment.
Looks good, only found one nit.
Comment on lines
+119
to
+121
| if math.IsInf(val, 0) || math.IsNaN(val) { | ||
| return serviceerror.NewInvalidArgumentf("%s must be a real float", key) | ||
| } |
Contributor
There was a problem hiding this comment.
Nit: I think this check should happen before the check against minValue. Maybe first convert to float64 and checking would make this easier to read.
Collaborator
Author
There was a problem hiding this comment.
Out of curiosity, why do you think it matters? (b/c of the -inf case?)
Contributor
There was a problem hiding this comment.
I was thinking about the case where val is NaN. For that situation the first check (line 116) will return the error.
Contributor
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was changed
Introduce a rate-based scaling algorithm that sizes the worker pool from EWMA-smoothed arrival, dispatch, and per-consumer capacity estimates. Desired worker count is the maximum of a utilization-target model and a Halfin-Whitt
square-root staffing model, bounded by min/max count, capped per scale-up step, and gated by asymmetric scale-up/scale-down cooldowns plus a no-sync-match quiet period. Includes unit tests covering the sizing and scaling decisions
This is an implementation of the algorithm in #54
Why?
The existing scaling algorithm makes a set of assumption that are true for Lambda, but not for other compute providers like GCP CloudRun or AWS ECS. This adds a new scaling algorithm more suited to these other compute providers.