Skip to content

profile-compaction: CollapseConfig CRD + projection overlay + user-ma…#808

Open
entlein wants to merge 2 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-crd-compaction
Open

profile-compaction: CollapseConfig CRD + projection overlay + user-ma…#808
entlein wants to merge 2 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-crd-compaction

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 15, 2026

here things got confusing during the rebase, this is the sister PR to storage and need the signature PR (from matthyx)

Summary by CodeRabbit

  • New Features

    • Added tamper detection for signed user-defined overlays with R1016 tamper alerts
    • Added overlay identity tracking for intelligent cache invalidation
    • Added NetworkNeighborhood resource support via pod labels
    • Added execution arguments tracking for enhanced wildcard pattern matching
  • Bug Fixes

    • Fixed cache notification blocking when subscribers are slow or unresponsive
    • Improved context propagation for Kubernetes list operations
  • Tests

    • Added tamper detection and alert deduplication tests
    • Added user overlay projection end-to-end tests
    • Added non-blocking notification fan-out tests

Review Change Stack

…naged lifecycle

Signed-off-by: entlein <einentlein@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@entlein has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 59 minutes and 15 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

βŒ› How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
βš™οΈ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9442e7e5-e8aa-4db6-bc87-a5f8d7f00138

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between 16a52f4 and 40960d5.

πŸ“’ Files selected for processing (3)
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
πŸ“ Walkthrough

Walkthrough

This PR extends container profile caching with tamper detection for user-supplied overlays, refactors projection to classify dynamic/wildcard patterns and support exec-arguments matching, adds overlay-identity stamping into cache checksums, and refactors rule-binding cache to dispatch notifications without holding locks.

Changes

Container Profile Cache: Tamper Detection, Overlay Identity, and Projection

Layer / File(s) Summary
Tamper detection state and verification entry points
pkg/objectcache/containerprofilecache/containerprofilecache.go
Adds tamperAlertExporter and tamperEmitted dedup map to ContainerProfileCacheImpl; imports exporter package; wires calls to verifyUserApplicationProfile and verifyUserNetworkNeighborhood after fetching overlays.
Verification methods and tamper-alert logic
pkg/objectcache/containerprofilecache/tamper_alert.go
Implements signature re-verification for ApplicationProfile and NetworkNeighborhood with per-tamper-key deduplication; emits R1016 alerts only on ErrSignatureMismatch; clears dedup state on successful re-verification; wires optional exporter via SetTamperAlertExporter.
Overlay identity stamping into SyncChecksum
pkg/objectcache/containerprofilecache/projection.go
Updates projectUserProfiles to stamp overlay namespace/name/resourceVersion into SyncChecksumMetadataKey; preserves baseline checksum prefix and ensures idempotent suffixes via stampOverlayIdentity helper.
Projection refactoring for dynamic/wildcard classification
pkg/objectcache/containerprofilecache/projection_apply.go
Refactors projectField to accept isDynamic callback; introduces isNetworkIPWildcard and isNetworkDNSWildcard for network pattern classification; adds extractExecsByPath for path→argv mapping; updates address extraction to use plural IPAddresses fields.
Data model extensions for ExecsByPath and UserDefinedNetwork
pkg/objectcache/projection_types.go, pkg/objectcache/shared_container_data.go
Adds ExecsByPath map[string][]string to ProjectedContainerProfile for exec-args matching; introduces UserDefinedNetworkMetadataKey constant and field to WatchedContainerData; extends SetContainerInfo to read and persist network label.
Tamper verification and alert dedup tests
pkg/objectcache/containerprofilecache/tamper_alert_test.go
Tests tamper-key deduplication, distinguishes tamper from operational errors, validates ErrSignatureMismatch sentinel, exercises end-to-end verification with exporter wiring and R1016 alert emission.
End-to-end projection and SyncChecksum contract tests
pkg/objectcache/containerprofilecache/test32_projection_test.go
Validates user overlay execs reach projected values; asserts idempotent overlay-identity stamping; tests multi-stage SyncChecksum reflection across overlay add/update/remove scenarios; includes baseline preservation check.
Mock implementation updates for new projection logic
pkg/objectcache/v1/mock.go
Updates mock projection to classify exec paths (dynamic vs literal) and populate ExecsByPath; refactors network projection to split wildcard/pattern vs exact-value entries; introduces helpers for lazy Values allocation and wildcard classification.

Rule Binding Cache: Non-Blocking Notification Dispatch

Layer / File(s) Summary
Non-blocking notifier dispatch and context propagation
pkg/rulebindingmanager/cache/cache.go
Refactors handlers to compute notifications under lock, snapshot notifiers, unlock, then dispatch non-blockingly; updates RefreshRuleBindingsRules for single coalesced refresh; changes Kubernetes List calls to use provided context instead of Background; adds snapshotNotifiersLocked, dispatchNonBlocking, and indexOfNotifier helpers.
Non-blocking dispatch and context contract tests
pkg/rulebindingmanager/cache/cache_test.go
Validates dispatchNonBlocking drops notifications on full channel; asserts RefreshRuleBindingsRules remains non-blocking when subscriber saturated; enforces via reflection that helper methods accept context.Context as first parameter; adjusts existing tests to pass explicit context.

Safety Fixes

Layer / File(s) Summary
Function cache ObjectCache nil check
pkg/rulemanager/cel/libraries/cache/function_cache.go
Adds explicit oc == nil guard in HashForContainerProfile early-return condition to prevent use of nil ObjectCache.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • kubescape/node-agent#799: Overlapping projection refactoring work on container profile model with dynamic/pattern handling introduced in that PR.
  • kubescape/node-agent#788: Both PRs extend pkg/objectcache/containerprofilecache CP-cache codepath; main PR builds on the CP-cache replacement foundation.

Suggested labels

release

Suggested reviewers

  • matthyx
  • slashben

Poem

🐰 A rabbit hops through overlay code,
Stamping checksums down the road,
Tampering checked, projections clean,
Patterns sorted, cache pristineβ€”
Non-blocking notes now pass and flow! πŸš€

πŸš₯ Pre-merge checks | βœ… 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.95% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is truncated/incomplete ('...') and too vague to clearly convey the main change; it mentions multiple unrelated concepts (CollapseConfig, projection overlay, user-ma) without clarity. Complete the title and clarify the primary change; consider 'Add tamper detection for user-defined overlay profiles' or similar once the full intent is clear.
βœ… Passed checks (3 passed)
Check name Status Explanation
Description Check βœ… Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check βœ… Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check βœ… Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
πŸ§ͺ Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❀️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

πŸ€– Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectcache/containerprofilecache/containerprofilecache.go`:
- Around line 412-414: The verification calls to
c.verifyUserApplicationProfile(userAP, sharedData.Wlid) currently ignore its
boolean result; update both call sites (the one using userAP at lines ~412 and
the similar one around ~428-430) to check the returned bool and, when it is
false and the cache is running in strict verification mode (e.g., the instance
flag controlling strict verification such as c.strictVerify or equivalent),
avoid projecting/merging the overlay (skip the merge/projectOverlay path), log
the verification failure with context (Wlid and which overlay), and return or
surface an error instead of continuing; if not in strict mode, continue but log
a warning. Ensure you reference and use c.verifyUserApplicationProfile(...) and
the strict-mode flag when implementing the conditional.
πŸͺ„ Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
βš™οΈ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: acff329b-9e89-4538-a272-7877e0ad1e70

πŸ“₯ Commits

Reviewing files that changed from the base of the PR and between cc59fa0 and 16a52f4.

πŸ“’ Files selected for processing (12)
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/objectcache/containerprofilecache/projection.go
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/tamper_alert.go
  • pkg/objectcache/containerprofilecache/tamper_alert_test.go
  • pkg/objectcache/containerprofilecache/test32_projection_test.go
  • pkg/objectcache/projection_types.go
  • pkg/objectcache/shared_container_data.go
  • pkg/objectcache/v1/mock.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/rulemanager/cel/libraries/cache/function_cache.go

Comment thread pkg/objectcache/containerprofilecache/containerprofilecache.go
entlein pushed a commit to k8sstormcenter/node-agent that referenced this pull request May 16, 2026
…verlay

CodeRabbit upstream PR kubescape#808 / containerprofilecache.go:414 (Major).
The verifyUserApplicationProfile and verifyUserNetworkNeighborhood
methods already return a boolean reflecting verification outcome β€”
true when the overlay is unsigned OR when verification succeeded OR
in permissive mode (EnableSignatureVerification=false); false only
in strict mode on actual tamper.

The two call sites in projection-load were discarding that return,
so tampered overlays in strict mode silently merged anyway. The
R1016 alert was emitted but the protection was advisory only.

Now: when verify returns false (strict mode + tamper detected) the
overlay is nilled out before the merge step so the cache never
projects a known-tampered profile. Permissive mode is unchanged β€”
verify always returns true, the overlay still merges, R1016 still
fires.

New tests:

  - TestVerifyAP_StrictMode_ReturnsFalseOnTamper β€” sign + tamper an
    ApplicationProfile, construct a cache with
    EnableSignatureVerification=true, and assert
    verifyUserApplicationProfile returns false (caller drops overlay).
  - TestVerifyNN_StrictMode_ReturnsFalseOnTamper β€” symmetric pin for
    the NetworkNeighborhood path.

The existing legacy-permissive tamper test
(TestVerifyAP_TamperedProfile_PopulatesDedupMap) continues to pass
unchanged β€” that path still returns true with the R1016 emitted.
…c1 final state

Signed-off-by: entlein <einentlein@gmail.com>
@entlein
Copy link
Copy Markdown
Author

entlein commented May 16, 2026

@coderabbitai review

Collective re-review request after applying your atomic-split advisory + rabbit-feedback fixes across the full SBOB upstream PR set. Each PR is now scoped to a single tier per your dependency map:

Tier 0 (opens):

Tier 2 (execs):

Tier 3 (network):

Tier 4 (tamper/sign):

Umbrellas:

Please confirm atomicity per your earlier advisory and flag any remaining cross-PR coupling we missed.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

@entlein β€” triggering the review now. In parallel, here's my atomicity assessment for PR #808 specifically, based on what's in the AI summary and the tier map you've provided:


PR #808 Atomicity Assessment

Appropriate Tier 4 content βœ…

  • tamper_alert.go / tamper_alert_test.go β€” core tamper detection, SetTamperAlertExporter, R1016 dedup logic
  • containerprofilecache.go β€” wiring tamperAlertExporter + tamperEmitted
  • projection.go β€” stampOverlayIdentity / projectUserProfiles overlay identity β†’ SyncChecksum
  • shared_container_data.go β€” UserDefinedNetworkMetadataKey / UserDefinedNetwork for NN resource

Cross-tier coupling flags ⚠️

File Issue Belongs to
projection_types.go Defines ExecsByPath map[string][]string on ProjectedContainerProfile Tier 2 Β· #812 ("ExecsByPath composite-key surface")
projection_apply.go Consumes ExecsByPath, containsDynamicSegment, isNetworkIPWildcard, isNetworkDNSWildcard Tier 2 (#812) for exec surface; Tier 3 (#806/#324) for network wildcards
v1/mock.go Updated to populate ExecsByPath + wildcard network patterns Tier 2 + Tier 3
rulebindingmanager/cache/cache.go + test Non-blocking fanout refactor β€” purely infrastructure, no tamper/sign semantics Pre-Tier 4 (Tier 0 or standalone)

Specific recommendations

  1. ExecsByPath type definition should either live in #812 (Tier 2) with #808 taking a compile dependency on it, or you should explicitly note that #808 is the defining PR and #812 merely consumes it. Right now the tier map implies the reverse, which is confusing.

  2. isNetworkIPWildcard / isNetworkDNSWildcard in projection_apply.go duplicate classifier logic that Tier 3 (#806, #324) is supposed to own. Either import from the Tier 3 package or note these are intentionally local shims with a // TODO: consolidate with pkg/objectcache/networkmatch comment.

  3. Non-blocking fanout (rulebindingmanager/cache) is fully orthogonal to tamper/sign. If this was included because it was required to prevent deadlocks when emitting tamper alerts under load, that's a valid reason β€” but it should be called out explicitly in the PR description as a required infrastructure prerequisite, not just bundled silently.


Bottom line: The tamper/sign core is clean. The main atomicity leaks are the ExecsByPath surface definition crossing from Tier 4 into what the map says is Tier 2, and the fanout refactor having no declared relationship to the Tier 4 scope. If you can clarify the ExecsByPath ownership direction and add a sentence to the PR description explaining why the fanout refactor is here, the remaining coupling is defensible.

βœ… Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

entlein added a commit to k8sstormcenter/node-agent that referenced this pull request May 16, 2026
…e#807, kubescape#808) (#44)

* fix(containerprofilecache): honor verify result before merging user overlay

CodeRabbit upstream PR kubescape#808 / containerprofilecache.go:414 (Major).
The verifyUserApplicationProfile and verifyUserNetworkNeighborhood
methods already return a boolean reflecting verification outcome β€”
true when the overlay is unsigned OR when verification succeeded OR
in permissive mode (EnableSignatureVerification=false); false only
in strict mode on actual tamper.

The two call sites in projection-load were discarding that return,
so tampered overlays in strict mode silently merged anyway. The
R1016 alert was emitted but the protection was advisory only.

Now: when verify returns false (strict mode + tamper detected) the
overlay is nilled out before the merge step so the cache never
projects a known-tampered profile. Permissive mode is unchanged β€”
verify always returns true, the overlay still merges, R1016 still
fires.

New tests:

  - TestVerifyAP_StrictMode_ReturnsFalseOnTamper β€” sign + tamper an
    ApplicationProfile, construct a cache with
    EnableSignatureVerification=true, and assert
    verifyUserApplicationProfile returns false (caller drops overlay).
  - TestVerifyNN_StrictMode_ReturnsFalseOnTamper β€” symmetric pin for
    the NetworkNeighborhood path.

The existing legacy-permissive tamper test
(TestVerifyAP_TamperedProfile_PopulatesDedupMap) continues to pass
unchanged β€” that path still returns true with the R1016 emitted.

* docs(applicationprofile): rule-author contracts on opens-with-suffix/prefix + exec-args

Two doc-only fixes for CodeRabbit cross-PR advisory:

#7 (NA kubescape#807): ap.was_path_opened_with_suffix / _prefix β€” explicitly
document the false-negative gap when the projection is in pass-through
mode (cp.Opens.All=true). Wildcard Patterns are skipped from the
suffix/prefix scan because their token-bearing text doesn't safely
answer suffix questions. Rule authors who need wildcard-aware coverage
should either declare an Opens-projection slice (so SuffixHits/PrefixHits
become authoritative for the literals they care about) or use
ap.was_path_opened (which runs CompareDynamic over Patterns).

#8 (NA kubescape#807): wasExecutedWithArgs β€” document the three states of
ExecsByPath:
  1. Path absent from Execs.Values β†’ exec not allowed, fall through.
  2. Path in Values, ABSENT from ExecsByPath β†’ legacy back-compat
     "no argv constraint", match.
  3. Path in Values, PRESENT with empty arg list [] β†’ explicit
     "ran with no args" constraint, NOT a wildcard.
The distinction is load-bearing for profile authors: an entry of
{Path: ..., Args: []} is a constraint, not a free pass.

No behavioural change. Tests pass unchanged.

* deps: pin stereoscope v0.1.9 + runtime-spec v1.2.1 (compat with kubescape/syft fork)

Storage rc1 bumped to syft v1.42.4 (CVE-2026-33481), which transitively
requires stereoscope v0.1.22 + runtime-spec v1.3.0. Those versions use
the new moby/moby/client submodule API, which is incompatible with
inspektor-gadget's moby/moby umbrella requirement on the node-agent
side (ambiguous-import wall β€” see issue #45).

Node-agent stays on kubescape/syft v1.32.0-ks.2 via the existing
replace, but transitive resolution from storage's go.mod pulls the
newer stereoscope into the build, breaking the build with:

  undefined: client.New
  undefined: client.PingOptions

Adds two replace directives to force the older transitive chain that
matches kubescape/syft v1.32.0-ks.2's expectations:

  github.com/anchore/stereoscope     => v0.1.9-0.20250826202322-...
  github.com/opencontainers/runtime-spec => v1.2.1

This is the minimum set needed for node-agent to build cleanly against
storage rc1 (with syft v1.42.4) while still using kubescape/syft on
its own side. Storage's CVE fix remains in effect at the storage
binary; node-agent's syft surface is unchanged.

Verified locally:
  go build ./...          ok
  go test ./pkg/objectcache/... ./pkg/rulemanager/... -count=1
                          β†’ 30+ packages ok

---------

Co-authored-by: Entlein <eineintlein@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants