Skip to content

feat (auto scale) : add StackableScaler CRD rollout and admission webhook#411

Open
soenkeliebau wants to merge 5 commits intomainfrom
feat/autoscale
Open

feat (auto scale) : add StackableScaler CRD rollout and admission webhook#411
soenkeliebau wants to merge 5 commits intomainfrom
feat/autoscale

Conversation

@soenkeliebau
Copy link
Member

Summary

Rolls out the StackableScaler CRD (defined in operator-rs) and adds a validation webhook
that rejects spec.replicas changes on StackableScaler resources while a scaling operation
is in progress.

fixes stackabletech/issues#667

  • CRD rollout -- StackableScaler CRD definition added to extra/crds.yaml with full
    schema (spec: replicas, clusterRef, role, roleGroup; status: currentState with
    stage enum, replicas, desiredReplicas, selector) and /scale + /status
    subresources
  • Admission webhook (scaler-admission.stackable.tech) -- targets UPDATE operations on
    stackablescalers.autoscaling.stackable.tech/v1alpha1. On spec.replicas change, fetches
    the live object to inspect status.current_state.stage (Kubernetes strips .status from
    admission review oldObject for CRDs with a status subresource). Denies the update if
    scaling is in progress (any stage other than Idle or Failed). Failure policy is Fail.
  • Conversion webhook registration -- StackableScaler added to the existing conversion
    webhook for future multi-version support
  • CRD schema output -- StackableScaler::merged_crd() added to the crd subcommand
  • CLI flag -- --disable-scaler-admission-webhook to skip the webhook (matches existing
    --disable-restarter-mutating-webhook pattern)
  • RBAC -- grants get, list, watch on stackablescalers in
    autoscaling.stackable.tech for the webhook's live-object fetch

Motivation

The StackableScaler state machine assumes spec.replicas remains stable while a scaling
operation is in progress. If the HPA writes a new value mid-flight, the operator would see
conflicting desired replica counts and the state machine's previous_replicas /
desired_replicas bookkeeping breaks down. The admission webhook enforces this invariant
at the API server level, before the write reaches etcd.

The webhook is validation-only (no mutations) but is implemented as a MutatingWebhook
because Kubernetes evaluates mutating webhooks before validating webhooks -- this ensures
the check runs before any other validating webhook that might depend on the replica count.

Dependencies

Test plan

  • cargo test --all-features passes
  • cargo clippy --all-targets --all-features -- -D warnings clean
  • Webhook correctly denies spec.replicas update when scaler is in PreScaling/Scaling/
    PostScaling stage
  • Webhook allows spec.replicas update when scaler is Idle or Failed
  • Webhook allows updates that don't change spec.replicas regardless of stage
  • --disable-scaler-admission-webhook flag prevents webhook registration
  • CRD schema output includes StackableScaler definition
  • Integration: HPA updates are blocked during active scaling, unblocked after completion

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

soenkeliebau and others added 5 commits March 11, 2026 09:39
Add a mutating admission webhook that guards StackableScaler replicas
changes during active scaling operations, preventing state machine
corruption from concurrent HPA updates.

Includes CRD YAML, RBAC roles, webhook registration, and generated
Cargo/Nix files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the inline pattern match against individual ScalerStage variants
with the new is_scaling_in_progress() method from operator-rs. This
removes the ScalerStage import and ensures the webhook stays correct
if new stages are added to the state machine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove cluster-kind label injection (no longer needed with .owns()).
Narrow to UPDATE operations only. The handler returns allow/deny
without patches, making it functionally a validating webhook.

The MutatingWebhook framework is retained because stackable-webhook
does not yet provide a ValidatingWebhook type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part of the ReplicasConfig rewrite: the label-injection webhook is no
longer needed (replaced by owner references and .owns()), so the
webhook description and code are updated to reflect validation-only
scope.

- Update CLI help text: remove label-injection reference, describe
  webhook as rejecting spec.replicas changes during active scaling.

- Simplify deny logic in scaler_admission_handler: use `if let` with
  `filter()` instead of `is_some_and()` + separate `stage_str`
  variable, removing the unnecessary "unknown" fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Shared AutoScaling Hook Functionality

1 participant