Skip to content

feat(otel): OpenTelemetry traces, logs, drop counters, and OTEL metrics (Phase 1 + 2)#818

Open
matthyx wants to merge 32 commits into
mainfrom
feat/otel-instrumentation-phase1
Open

feat(otel): OpenTelemetry traces, logs, drop counters, and OTEL metrics (Phase 1 + 2)#818
matthyx wants to merge 32 commits into
mainfrom
feat/otel-instrumentation-phase1

Conversation

@matthyx
Copy link
Copy Markdown
Contributor

@matthyx matthyx commented May 19, 2026

Summary

Phase 1 — Traces, logs, drop counters

  • New `pkg/otelsetup` package: `InitProviders` wires up TracerProvider, LoggerProvider, and MeterProvider over OTLP gRPC; injects ARMO `X-API-Key` / `X-Customer-GUID` auth headers when the endpoint matches `otel.armosec.io`; returns no-op providers when no endpoint is configured
  • Container profile lifecycle tracing: `ProfileLifecycleTracker` maintains one long-running span per container learning period (bounded at 10k entries with LRU eviction), recording `profile.entry.saved`, `learning.completed`, `learning.terminated`, and eviction events
  • Alert log records: `EmitAlertLogRecord` emits structured OTEL log records for every fired rule and malware detection; includes 60s/1000-entry dedup LRU to avoid flooding on hot rules
  • eBPF drop counters: `node_agent.ebpf.events_dropped.total` incremented in container watcher and event handler factory drop paths, labelled by `reason`
  • Slow-eval spans: rule evaluations exceeding `OTEL_SLOW_EVAL_THRESHOLD_MS` emit a `rule.evaluate` span
  • Ring-buffer log processor: 7500-entry ring buffer retains recent log records; flush endpoint activates automatically when KS_LOGGER_LEVEL=debug
  • sbommanager: attaches `otelgrpc.NewClientHandler()` for automatic trace propagation

Phase 2 — Replace Prometheus metrics with OTEL SDK

  • New `pkg/metricsmanager/otel/`: full `MetricsManager` interface backed by OTEL SDK; attribute-set caching on all hot paths (2× faster, 10× less memory vs Prometheus on the histogram path)
  • Collapsed eBPF counters: 17 individual per-event-type counters → single `node_agent.ebpf.events.total{event_type}`
  • Prometheus scrape mode: `OTEL_METRICS_EXPORTER=prometheus` installs an OTEL→Prometheus bridge and starts `:8080/metrics` listener
  • `rule.ID` standardisation: all metric call sites now use the stable rule ID (e.g. `R1001`) instead of the display name; malware alerts use constant `"malware"` to bound cardinality
  • `docs/metrics-migration.md`: full mapping of old Prometheus names → new OTEL names with dashboard update checklist
  • A/B benchmarks: hard gate passes — OTEL allocs/op ≤ Prometheus allocs/op, ns/op ≤ 1.1× Prometheus on `BenchmarkReportRuleEvaluationTime`

New env vars

Variable Default Purpose
`OTEL_EXPORTER_OTLP_ENDPOINT` Base OTLP gRPC endpoint
`OTEL_METRICS_EXPORTER` Set to `prometheus` to enable scrape endpoint on `:8080/metrics`
`OTEL_SLOW_EVAL_THRESHOLD_MS` 0 (disabled) Threshold for slow-eval spans
`OTEL_DEBUG_PORT` 6060 Debug listener port

`OTEL_COLLECTOR_SVC` is now deprecated (superseded by `OTEL_EXPORTER_OTLP_ENDPOINT`).

Breaking change

Metric names changed. See `docs/metrics-migration.md` for the full mapping and dashboard update checklist.

Test plan

  • `go build ./...` — passes
  • `go test ./pkg/otelsetup/... ./pkg/metricsmanager/...` — all pass
  • A/B benchmark: OTEL `ReportRuleEvaluationTime` ~95 ns/op / 32 B / 2 allocs vs Prometheus ~200 ns/op / 336 B / 2 allocs — gate passes
  • `ProfileLifecycleTracker` and `RingBufferLogProcessor` unit tests pass

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Provider-based OpenTelemetry init, OTEL-backed metrics manager replacing prior Prometheus path; expanded metrics (events, rules, SBOM, alerts), gRPC instrumentation, profile lifecycle spans, alert deduplication and suppression reporting
  • Documentation

    • Expanded OTEL configuration reference, runtime notes, and Prometheus→OTEL migration guide
  • Tests

    • New unit tests and benchmarks for OTEL, lifecycle tracking, thresholds, and metrics

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces legacy OTEL wiring with provider-based InitProviders, adds an OTEL-backed metrics manager, implements ProfileLifecycleTracker, instruments rules/profiles/ebpf/malware/SBOM/gRPC, renames a metrics config flag, updates docs/go.mod, and adds tests and benchmarks.

Changes

OpenTelemetry Setup and Integration

Layer / File(s) Summary
Provider init, wrapper, and alert logging
pkg/otelsetup/setup.go, pkg/otelsetup/otelsetup_test.go
Adds ProviderConfig alias, SlowEvalThreshold backing, global Tracer/Logger/Meter accessors, InitProviders (including Prometheus meter mode), and EmitAlertLogRecord plus test scaffolding.
ProfileLifecycleTracker
pkg/otelsetup/lifecycle.go, pkg/otelsetup/otelsetup_test.go
Implements ProfileLifecycleTracker with per-container learning spans, LearningSpanID/Traceparent accessors, OnEntrySaved snapshot throttling, OnLearningEnded, and cap eviction with unit tests.
Main startup and metrics wiring
cmd/main.go, cmd/sbom-scanner/main.go
Replaces old OTEL init with otelsetup.InitProviders, builds ProviderConfig from config/cluster/credentials, defers shutdown with 5s timeout, creates otelmetrics.NewOTELMetricsManager, and threads metrics manager into components.
OTEL metrics manager and benchmarks
pkg/metricsmanager/otel/otel_metrics_manager.go, pkg/metricsmanager/otel/bench_test.go, pkg/metricsmanager/prometheus/bench_test.go
Adds OTEL-backed OTELMetricsManager (instruments, caches, reporting API, Start/Destroy), SBOM and alert-suppression metrics, and benchmark suites for OTEL and Prometheus metric paths.
Container profile lifecycle wiring
pkg/containerprofilemanager/v1/containerprofile_manager.go, pkg/containerprofilemanager/v1/monitoring.go
Adds lifecycleTracker field, initializes it, calls OnLearningStarted/OnEntrySaved/OnLearningEnded at lifecycle checkpoints and error paths, and annotates saved profiles with OTEL trace/span metadata.
eBPF dropped-event metrics
pkg/containerwatcher/v2/container_watcher.go, pkg/containerwatcher/v2/event_handler_factory.go
Adds node_agent.ebpf.events_dropped.total Int64Counter via otelsetup.Meter() to ContainerWatcher and EventHandlerFactory, increments on dropped events with event_type and reason attributes.
RuleManager spans and alert dedupe
pkg/rulemanager/rule_manager.go
Adds an expirable LRU + mutex to deduplicate SecurityAlert log records per (rule,container), emits rule.evaluate spans when evaluation exceeds SlowEvalThreshold, and switches some metrics to use rule.ID.
Malware alert OTEL log emission
pkg/malwaremanager/v1/malware_manager.go
Emits structured SecurityAlert OTEL log records for malware alerts with rule/runtime/malware attributes and uses malware constant for metrics.
SBOM gRPC client/server instrumentation
pkg/sbommanager/v1/sbom_manager.go, cmd/sbom-scanner/main.go, pkg/sbomscanner/v1/client.go, pkg/sbomscanner/v1/server.go
Adds otelgrpc client/server handlers with health-method filtering, injects MetricsManager into SbomManager, wraps SBOM creation in sbom.scan spans and records heap alloc attrs, and conditionally starts Go runtime metrics.
Exporters and HTTP exporter metrics injection
pkg/exporters/exporters_bus.go, pkg/exporters/http_exporter.go, pkg/exporters/alert_bulk_manager.go, pkg/exporters/alert_manager.go
InitExporters now accepts a MetricsManager, HTTPExporter stores & uses metrics client (noop fallback), reports alert suppression and uses context-aware logging across exporters and bulk manager.
Metrics API, mocks, noop, and Prometheus impl updates
pkg/metricsmanager/*
Adds context-aware ReportRuleEvaluationTime(ctx, ...), SBOM scan helpers and ReportAlertSuppressed to interface; updates mocks, noop, Prometheus impls, and registers SBOM/alert-suppression metrics.
Config, docs, and dependency updates
pkg/config/*, docs/CONFIGURATION.md, docs/metrics-migration.md, go.mod
Renames EnablePrometheusExporterEnableMetricsExporter (same mapstructure key), updates TestLoadConfig, expands environment docs and OTEL Notes, adds metrics-migration guide, and reshuffles/bumps OpenTelemetry and Prometheus-related dependencies.
Context-aware logging fixes
pkg/malwaremanager/v1/clamav/*, various exporters
Replaces several non-context logger calls with logger.L().Ctx(...) variants across exporters and ClamAV flows.

Sequence Diagram(s): omitted (changes are multiple feature areas; no single three-component sequential flow benefits more than the above diagrams).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

release

Suggested reviewers

  • YakirOren
  • slashben

"🐰 A hop for traces, a twitch for logs,
Counters tallying dropped eBPF clogs,
Profiles learn with spans held near,
Alerts now whisper, clean and clear,
Hooray — telemetry in my paws!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: OpenTelemetry integration with traces, logs, drop counters, and metrics (Phases 1 & 2). It accurately reflects the substantial refactoring from Prometheus-only to OTEL-based instrumentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/otel-instrumentation-phase1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@matthyx matthyx force-pushed the feat/otel-instrumentation-phase1 branch from bd33437 to dcb83de Compare May 19, 2026 19:24
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.196 0.185 -5.7%
Peak CPU (cores) 0.211 0.208 -1.2%
Avg Memory (MiB) 311.043 270.446 -13.1%
Peak Memory (MiB) 312.699 276.422 -11.6%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1766 119394 98.5%
network 904 77996 98.9%
open 33777 622476 94.9%
symlink 6000 0 0.0%
syscall 979 1886 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 8 9
dns_counter 1446 1434
exec_counter 7231 7174
network_counter 95086 94343
open_counter 791819 785344
syscall_counter 3622 3602

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/containerwatcher/v2/container_watcher.go (1)

490-501: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Release dropped events on the non-blocking drop path.

On Line 490, dropped events skip the worker callback (where enrichedEvent.Event.Release() normally happens), so they are never released.

💡 Proposed fix
 		} else {
 			logger.L().Warning("ContainerWatcher - Worker channel full, dropping event",
 				helpers.String("eventType", string(entry.EventType)),
 				helpers.String("containerID", entry.ContainerID))
 			cw.ebpfDropCounter.Add(context.Background(),
 				1,
 				metric.WithAttributes(
 					attribute.String("event_type", string(entry.EventType)),
 					attribute.String("reason", "worker_channel_full"),
 				),
 			)
+			enrichedEvent.Event.Release()
 		}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/containerwatcher/v2/container_watcher.go` around lines 490 - 501, The
non-blocking "worker channel full" branch drops events without releasing their
resources; update the else branch so that before logging/incrementing
cw.ebpfDropCounter you call the event release used in the worker path (e.g. call
enrichedEvent.Event.Release() or the appropriate Release() on the event stored
in entry), guarding with a nil-check if necessary, so dropped events are
properly released just like in the worker callback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/main.go`:
- Around line 116-122: The deferred OTEL shutdown in main is skipped because
main calls os.Exit; move to the safer pattern: create a run() function that
contains the existing setup and the defer block that calls otelShutdown using
context.WithTimeout (and defers cancel), keep the defer exactly as in the diff,
replace all internal os.Exit(...) calls inside run() with returns of the
corresponding exit code, and change main to call os.Exit(run()); ensure you
reference the existing otelShutdown variable and the defer with
context.WithTimeout so the shutdown runs on SIGINT/SIGTERM paths.
- Around line 103-112: The OTEL provider call currently passes raw os.Getenv
values for NodeName, PodName, Namespace and ClusterName which bypass the
resolved config and clusterData; update the otelsetup.InitProviders
ProviderConfig construction (the call site where ProviderConfig is built) to use
the resolved config and cluster values (e.g., cfg.NodeName / cfg.PodName /
cfg.Namespace / cfg.ClusterName and existing clusterData fields) with optional
fallbacks to the env vars if needed so telemetry uses the final config values
instead of the raw environment.

In `@docs/CONFIGURATION.md`:
- Around line 112-115: Add a new environment variable row documenting
KS_LOGGER_NAME to the environment variables table, noting its default value and
valid options (e.g., "slog" vs "prettylogger"/"zaplogger") and that setting
KS_LOGGER_NAME=slog activates the ring buffer used by the retroactive log
export; also reference that ENABLE_DEBUG_LISTENER must be true to keep the last
7,500 log records in memory and that they can be re-emitted via a POST to
/debug/flush-ring-buffer.

In `@pkg/otelsetup/otelsetup_test.go`:
- Around line 156-161: Test TestSlowEvalThreshold_Default modifies the
package-global atomic slowEvalThresholdNs and does not restore it; update the
test to capture the current value of slowEvalThresholdNs before calling
slowEvalThresholdNs.Store(...), and defer restoring the saved value (using
slowEvalThresholdNs.Store(old)) so the global state is returned to its prior
value after the test; reference the slowEvalThresholdNs symbol and the
TestSlowEvalThreshold_Default test and keep the assertion on SlowEvalThreshold()
unchanged.

In `@pkg/otelsetup/setup.go`:
- Around line 185-187: The exporter option construction uses
otlptracegrpc.WithEndpoint (and likewise otlpmetricgrpc.WithEndpoint and
otlplogsgrpc.WithEndpoint) which requires host:port; update the logic that
builds traceOpts/metricOpts/logOpts to detect URL-style endpoints (use the
existing isARMOEndpoint() utility or check for "://") and call
WithEndpointURL(endpoint) when a scheme is present, otherwise keep
WithEndpoint(endpoint); apply the same change for the metric and log exporter
option blocks referenced around the other diffs so all three exporters handle
both host:port and full URL endpoints correctly.
- Around line 287-289: The shutdown closure currently derives shutdownCtx from
the incoming c which can be already cancelled; change it to create a fresh
timeout context using context.WithTimeout(context.Background(), 5*time.Second)
(instead of context.WithTimeout(c, ...)) so provider shutdown/flush calls are
not short-circuited by the caller's cancelled context; update the shutdown
function (the shutdown variable in setup.go) to use that fresh context and still
call defer cancel() as before.

---

Outside diff comments:
In `@pkg/containerwatcher/v2/container_watcher.go`:
- Around line 490-501: The non-blocking "worker channel full" branch drops
events without releasing their resources; update the else branch so that before
logging/incrementing cw.ebpfDropCounter you call the event release used in the
worker path (e.g. call enrichedEvent.Event.Release() or the appropriate
Release() on the event stored in entry), guarding with a nil-check if necessary,
so dropped events are properly released just like in the worker callback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c5cf97b-b20d-4d20-93f8-9fcf98af12b5

📥 Commits

Reviewing files that changed from the base of the PR and between bf71679 and 515f7bb.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • cmd/main.go
  • docs/CONFIGURATION.md
  • go.mod
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/containerprofilemanager/v1/containerprofile_manager.go
  • pkg/containerprofilemanager/v1/monitoring.go
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/containerwatcher/v2/event_handler_factory.go
  • pkg/containerwatcher/v2/tracers/top.go
  • pkg/malwaremanager/v1/malware_manager.go
  • pkg/otelsetup/lifecycle.go
  • pkg/otelsetup/otelsetup_test.go
  • pkg/otelsetup/setup.go
  • pkg/rulemanager/rule_manager.go
  • pkg/sbommanager/v1/sbom_manager.go

Comment thread cmd/main.go
Comment thread cmd/main.go
Comment thread docs/CONFIGURATION.md Outdated
Comment thread pkg/otelsetup/otelsetup_test.go Outdated
Comment thread pkg/otelsetup/setup.go Outdated
Comment thread pkg/otelsetup/setup.go Outdated
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.215 0.221 +2.8%
Peak CPU (cores) 0.223 0.235 +5.4%
Avg Memory (MiB) 343.940 267.842 -22.1%
Peak Memory (MiB) 348.543 274.531 -21.2%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1708 119455 98.6%
network 903 78000 98.9%
open 35202 621052 94.6%
symlink 6000 0 0.0%
syscall 988 1904 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 10 8
dns_counter 1430 1449
exec_counter 7153 7250
network_counter 94110 95376
open_counter 783483 794469
syscall_counter 3635 3517

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/otelsetup/setup.go (1)

171-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

ProviderConfig still drops cluster/version metadata.

cfg.ClusterName and cfg.ServiceVersion are passed into InitProviders, but they never make it into the shared OTEL resource. That means all three signals still lose cluster identity and agent version even after cmd/main.go was updated to resolve those values first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/otelsetup/setup.go` around lines 171 - 177, The OTEL resource built in
resource.Merge (inside InitProviders / setup.go) omits cluster and service
version; update the resource.NewWithAttributes call to include cfg.ClusterName
and cfg.ServiceVersion by adding the appropriate semantic attributes (e.g.
semconv.K8SClusterName(cfg.ClusterName) and
semconv.ServiceVersion(cfg.ServiceVersion)) alongside semconv.ServiceName,
semconv.K8SNodeName, semconv.K8SPodName, and semconv.K8SNamespaceName so the
shared resource carries cluster identity and agent version.
♻️ Duplicate comments (1)
cmd/main.go (1)

116-122: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Deferred OTEL shutdown is still skipped on hard-exit paths.

The defer at Lines 116-122 never runs after os.Exit(...), and the later Fatal(...) paths terminate immediately as well. Normal SIGTERM/SIGINT shutdown can still bypass provider flush and drop batched telemetry.

Also applies to: 131-139, 186-187, 495-528

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/main.go` around lines 116 - 122, The deferred otelShutdown in the
anonymous defer block will be skipped on hard exits (os.Exit and log.Fatal
paths) so ensure otelShutdown(shutdownCtx) is invoked before any immediate exit:
replace direct os.Exit / log.Fatal usage or any hard-exit paths that occur near
the locations mentioned with a small exit helper (e.g., callExitWithShutdown)
that calls otelShutdown with a context timeout (same 5s pattern used in the
current defer), waits for completion (or logs error), then performs the actual
os.Exit; update all occurrences that terminate immediately (references:
otelShutdown, the anonymous defer block, and places that call os.Exit / Fatal)
to use that helper so provider flush runs on hard-exit paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/otelsetup/setup.go`:
- Around line 171-177: The OTEL resource built in resource.Merge (inside
InitProviders / setup.go) omits cluster and service version; update the
resource.NewWithAttributes call to include cfg.ClusterName and
cfg.ServiceVersion by adding the appropriate semantic attributes (e.g.
semconv.K8SClusterName(cfg.ClusterName) and
semconv.ServiceVersion(cfg.ServiceVersion)) alongside semconv.ServiceName,
semconv.K8SNodeName, semconv.K8SPodName, and semconv.K8SNamespaceName so the
shared resource carries cluster identity and agent version.

---

Duplicate comments:
In `@cmd/main.go`:
- Around line 116-122: The deferred otelShutdown in the anonymous defer block
will be skipped on hard exits (os.Exit and log.Fatal paths) so ensure
otelShutdown(shutdownCtx) is invoked before any immediate exit: replace direct
os.Exit / log.Fatal usage or any hard-exit paths that occur near the locations
mentioned with a small exit helper (e.g., callExitWithShutdown) that calls
otelShutdown with a context timeout (same 5s pattern used in the current defer),
waits for completion (or logs error), then performs the actual os.Exit; update
all occurrences that terminate immediately (references: otelShutdown, the
anonymous defer block, and places that call os.Exit / Fatal) to use that helper
so provider flush runs on hard-exit paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aa4254e2-e99b-47b8-af17-b2fe32adc73c

📥 Commits

Reviewing files that changed from the base of the PR and between 515f7bb and da9e60f.

📒 Files selected for processing (4)
  • cmd/main.go
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/otelsetup/otelsetup_test.go
  • pkg/otelsetup/setup.go

@matthyx
Copy link
Copy Markdown
Contributor Author

matthyx commented May 20, 2026

@coderabbitai review again

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

@matthyx: Sure, I'll re-review the changes in this PR right away!

✅ Actions performed

Full review triggered.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.199 0.197 -1.0%
Peak CPU (cores) 0.213 0.205 -3.6%
Avg Memory (MiB) 330.575 275.331 -16.7%
Peak Memory (MiB) 333.004 290.117 -12.9%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 6000 0 0.0%
http 1703 119457 98.6%
network 905 77828 98.9%
open 36346 619901 94.5%
symlink 6000 0 0.0%
syscall 981 1897 65.9%
Event Counters
Metric BEFORE AFTER
capability_counter 11 8
dns_counter 1441 1383
exec_counter 7250 6995
network_counter 95295 91840
open_counter 794354 767514
syscall_counter 3515 3365

@matthyx matthyx force-pushed the feat/otel-instrumentation-phase1 branch from da9e60f to 2e19259 Compare May 20, 2026 15:29
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.214 0.216 +1.1%
Peak CPU (cores) 0.224 0.228 +1.8%
Avg Memory (MiB) 331.498 269.725 -18.6%
Peak Memory (MiB) 339.590 275.367 -18.9%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 5999 0 0.0%
http 1705 119458 98.6%
network 900 77791 98.9%
open 36263 619957 94.5%
symlink 5999 0 0.0%
syscall 981 1885 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 11 8
dns_counter 1454 1414
exec_counter 7316 7246
network_counter 96129 94987
open_counter 801893 795185
syscall_counter 3672 3631

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.198 0.192 -3.3%
Peak CPU (cores) 0.209 0.203 -2.9%
Avg Memory (MiB) 330.410 271.648 -17.8%
Peak Memory (MiB) 333.457 281.020 -15.7%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 0 0 N/A
hardlink 5999 0 0.0%
http 1767 119392 98.5%
network 905 77983 98.9%
open 36400 619834 94.5%
symlink 5999 0 0.0%
syscall 980 1877 65.7%
Event Counters
Metric BEFORE AFTER
capability_counter 11 9
dns_counter 1429 1444
exec_counter 7199 7225
network_counter 94605 95022
open_counter 789856 792502
syscall_counter 3520 3599

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/otelsetup/lifecycle.go (1)

39-45: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid evicting another container when replacing an existing tracked one.

At capacity, eviction runs before the replace check. If containerID already exists, this can evict an unrelated active lifecycle span and reduce tracked coverage unnecessarily.

Suggested patch
 func (t *ProfileLifecycleTracker) OnLearningStarted(containerID, namespace, pod, image string) {
 	t.mu.Lock()
 	defer t.mu.Unlock()
-	if len(t.spans) >= maxTrackedProfiles {
-		t.evictOldest()
-	}
 	if existing, ok := t.spans[containerID]; ok {
 		existing.AddEvent("learning.replaced")
 		existing.End()
+	} else if len(t.spans) >= maxTrackedProfiles {
+		t.evictOldest()
 	}
 	spanCtx, span := Tracer().Start(context.Background(), "container.profile.learning",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/otelsetup/lifecycle.go` around lines 39 - 45, The current logic evicts
oldest span before checking for an existing span, which can remove an unrelated
active lifecycle span when containerID is being replaced; change the flow in the
lifecycle tracking function so you first check if containerID already exists in
t.spans (the existing := t.spans[containerID] path and calls to
existing.AddEvent("learning.replaced") / existing.End()), perform the
replacement without calling t.evictOldest, and only if the containerID is not
already present enforce capacity by calling t.evictOldest when len(t.spans) >=
maxTrackedProfiles; this preserves existing spans and only evicts when adding a
new tracked container.
pkg/containerwatcher/v2/container_watcher.go (1)

179-183: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle the Int64Counter creation error instead of discarding it.

In pkg/containerwatcher/v2/container_watcher.go (lines 179-183), the code ignores the err from otelsetup.Meter().Int64Counter(...). OpenTelemetry-Go’s contract intends the returned instrument to remain usable/non-nil even when err != nil, but the error still signals an instrument registration/name issue (e.g., ErrInstrumentName) that can lead to incorrect/conflicting metric streams—so handle/log/propagate err instead of dropping it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/containerwatcher/v2/container_watcher.go` around lines 179 - 183, The
Int64Counter creation call using otelsetup.Meter().Int64Counter assigned to
ebpfDropCounter currently discards its returned error; capture the error (e.g.,
ebpfDropCounter, err := otelsetup.Meter().Int64Counter(...)) and handle it—log
it via the existing logger (processLogger or similar) or return it from the
containing function so instrument registration problems (ErrInstrumentName,
etc.) are visible; ensure the handling occurs where ebpfDropCounter is declared
so downstream code either gets a valid instrument or the startup fails/alerts
appropriately.
🧹 Nitpick comments (1)
pkg/otelsetup/otelsetup_test.go (1)

116-121: ⚡ Quick win

Make eviction ordering deterministic in the test (avoid time.Sleep).

Using time.Sleep for ordering can cause intermittent flakes. Setting startTimes explicitly under lock keeps this test deterministic.

Suggested patch
 	tracker := NewProfileLifecycleTracker()
 	tracker.OnLearningStarted("old", "ns", "pod", "")
-	time.Sleep(time.Millisecond)
 	tracker.OnLearningStarted("new", "ns", "pod", "")

 	tracker.mu.Lock()
+	tracker.startTimes["old"] = time.Unix(0, 1)
+	tracker.startTimes["new"] = time.Unix(0, 2)
 	tracker.evictOldest()
 	tracker.mu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/otelsetup/otelsetup_test.go` around lines 116 - 121, The test relies on
time.Sleep to create ordering for eviction; instead make ordering deterministic
by acquiring tracker.mu, setting tracker.startTimes for the relevant keys to
explicit timestamps (older for the entry you expect evicted, newer for the one
to keep), release the lock, then call
tracker.OnLearningStarted("new","ns","pod","") and invoke tracker.mu.Lock();
tracker.evictOldest(); tracker.mu.Unlock(); this replaces the non-deterministic
time.Sleep and ensures evictOldest() sees the intended startTimes ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/otelsetup/lifecycle.go`:
- Around line 97-103: OnEntrySaved currently increments t.counts[containerID]
before verifying the container is tracked, which can grow counts for unknown
IDs; fix by checking the map lookup result (ctx, ok := t.ctxs[containerID])
while holding t.mu and only incrementing t.counts[containerID] (and reading
count) if ok is true, moving the increment into the guarded branch and returning
early without touching t.counts when ok is false; ensure the lock/unlock
semantics around t.mu remain correct.

---

Outside diff comments:
In `@pkg/containerwatcher/v2/container_watcher.go`:
- Around line 179-183: The Int64Counter creation call using
otelsetup.Meter().Int64Counter assigned to ebpfDropCounter currently discards
its returned error; capture the error (e.g., ebpfDropCounter, err :=
otelsetup.Meter().Int64Counter(...)) and handle it—log it via the existing
logger (processLogger or similar) or return it from the containing function so
instrument registration problems (ErrInstrumentName, etc.) are visible; ensure
the handling occurs where ebpfDropCounter is declared so downstream code either
gets a valid instrument or the startup fails/alerts appropriately.

In `@pkg/otelsetup/lifecycle.go`:
- Around line 39-45: The current logic evicts oldest span before checking for an
existing span, which can remove an unrelated active lifecycle span when
containerID is being replaced; change the flow in the lifecycle tracking
function so you first check if containerID already exists in t.spans (the
existing := t.spans[containerID] path and calls to
existing.AddEvent("learning.replaced") / existing.End()), perform the
replacement without calling t.evictOldest, and only if the containerID is not
already present enforce capacity by calling t.evictOldest when len(t.spans) >=
maxTrackedProfiles; this preserves existing spans and only evicts when adding a
new tracked container.

---

Nitpick comments:
In `@pkg/otelsetup/otelsetup_test.go`:
- Around line 116-121: The test relies on time.Sleep to create ordering for
eviction; instead make ordering deterministic by acquiring tracker.mu, setting
tracker.startTimes for the relevant keys to explicit timestamps (older for the
entry you expect evicted, newer for the one to keep), release the lock, then
call tracker.OnLearningStarted("new","ns","pod","") and invoke
tracker.mu.Lock(); tracker.evictOldest(); tracker.mu.Unlock(); this replaces the
non-deterministic time.Sleep and ensures evictOldest() sees the intended
startTimes ordering.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 226ea8a5-cbde-422e-82d6-c29aa8dcd6c0

📥 Commits

Reviewing files that changed from the base of the PR and between da9e60f and ea28040.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • cmd/main.go
  • go.mod
  • pkg/containerprofilemanager/v1/monitoring.go
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/otelsetup/lifecycle.go
  • pkg/otelsetup/otelsetup_test.go
  • pkg/otelsetup/setup.go

Comment thread pkg/otelsetup/lifecycle.go
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.206 0.214 +3.8%
Peak CPU (cores) 0.214 0.225 +5.3%
Avg Memory (MiB) 350.030 266.701 -23.8%
Peak Memory (MiB) 355.156 273.195 -23.1%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1764 119394 98.5%
network 901 77998 98.9%
open 34143 622102 94.8%
symlink 6000 0 0.0%
syscall 978 1893 65.9%
Event Counters
Metric BEFORE AFTER
capability_counter 10 9
dns_counter 1442 1417
exec_counter 7236 7087
network_counter 95143 93217
open_counter 792738 777535
syscall_counter 3659 3516

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/rulemanager/rule_manager.go (1)

372-385: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Slow-eval span gate is always true with the documented default.

Line 374 uses evaluationTime >= otelsetup.SlowEvalThreshold(). If OTEL_SLOW_EVAL_THRESHOLD_MS defaults to 0, this records spans for every rule evaluation, not only slow ones.

Suggested fix
-		if evaluationTime >= otelsetup.SlowEvalThreshold() {
+		if threshold := otelsetup.SlowEvalThreshold(); threshold > 0 && evaluationTime >= threshold {
 			_, span := otelsetup.Tracer().Start(rm.ctx, "rule.evaluate",
 				trace.WithAttributes(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulemanager/rule_manager.go` around lines 372 - 385, The slow-eval span
gate currently uses >= which with the default OTEL_SLOW_EVAL_THRESHOLD_MS == 0
will record every evaluation; change the condition in rule_manager.go from
evaluationTime >= otelsetup.SlowEvalThreshold() to evaluationTime >
otelsetup.SlowEvalThreshold() (ensuring both sides are the same unit, e.g.,
time.Duration) so only truly slower-than-threshold evaluations create the
"rule.evaluate" span; if otelsetup.SlowEvalThreshold() can be zero by design,
also consider ensuring its default is a positive duration instead of 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/rulemanager/rule_manager.go`:
- Around line 372-385: The slow-eval span gate currently uses >= which with the
default OTEL_SLOW_EVAL_THRESHOLD_MS == 0 will record every evaluation; change
the condition in rule_manager.go from evaluationTime >=
otelsetup.SlowEvalThreshold() to evaluationTime > otelsetup.SlowEvalThreshold()
(ensuring both sides are the same unit, e.g., time.Duration) so only truly
slower-than-threshold evaluations create the "rule.evaluate" span; if
otelsetup.SlowEvalThreshold() can be zero by design, also consider ensuring its
default is a positive duration instead of 0.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0cfa1ecd-a736-45e0-baea-57fe5d4815f4

📥 Commits

Reviewing files that changed from the base of the PR and between ea28040 and 4525e89.

📒 Files selected for processing (10)
  • pkg/containerprofilemanager/v1/monitoring.go
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/exporters/alert_bulk_manager.go
  • pkg/exporters/alert_manager.go
  • pkg/exporters/http_exporter.go
  • pkg/malwaremanager/v1/clamav/clamav.go
  • pkg/malwaremanager/v1/clamav/exec.go
  • pkg/malwaremanager/v1/clamav/open.go
  • pkg/malwaremanager/v1/malware_manager.go
  • pkg/rulemanager/rule_manager.go
✅ Files skipped from review due to trivial changes (2)
  • pkg/exporters/http_exporter.go
  • pkg/exporters/alert_bulk_manager.go

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.218 0.210 -3.8%
Peak CPU (cores) 0.230 0.220 -4.3%
Avg Memory (MiB) 332.306 269.525 -18.9%
Peak Memory (MiB) 334.609 275.129 -17.8%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1767 119394 98.5%
network 902 77918 98.9%
open 36198 620067 94.5%
symlink 6000 0 0.0%
syscall 988 1906 65.9%
Event Counters
Metric BEFORE AFTER
capability_counter 9 8
dns_counter 1449 1449
exec_counter 7249 7288
network_counter 95369 95812
open_counter 793322 798400
syscall_counter 3539 3626

matthyx and others added 9 commits May 21, 2026 21:20
Introduces OTEL instrumentation across node-agent without touching existing
Prometheus metrics:

- pkg/otelsetup: new package with InitProviders (TracerProvider + LoggerProvider +
  MeterProvider via OTLP gRPC), ARMO auth header injection, ring-buffer log
  processor (7500-entry), ProfileLifecycleTracker (one span per container
  learning period, cap 10k), SlowEvalThreshold, EmitAlertLogRecord with
  60s/1000-entry dedup LRU, and debug HTTP listener
- pkg/rulemanager: emit alert OTEL log records per fired rule; add slow-path
  span for evaluations exceeding SlowEvalThreshold
- pkg/malwaremanager: emit alert OTEL log records for malware detections
- pkg/containerwatcher: count dropped eBPF events via Int64Counter
  (node_agent.ebpf.events_dropped.total) with reason label
- pkg/containerprofilemanager: wire ProfileLifecycleTracker lifecycle hooks
  (OnLearningStarted/OnEntrySaved/OnLearningEnded)
- pkg/sbommanager: attach otelgrpc stats handler to gRPC dial
- docs/CONFIGURATION.md: document new OTEL env vars; mark OTEL_COLLECTOR_SVC
  as deprecated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
InitProviders now guards each exporter behind its own non-empty endpoint
check, so a traces-only config no longer instantiates log/metric exporters
against empty targets (avoiding retry loops). The ARMO no-credentials guard
is extended to cover metrics-only ARMO configs. Each provider pointer is
nil-checked in the combined shutdown. Debug listener is gated on logProvider
being non-nil.

Alert-log dedup in rule_manager replaces the racy Contains+Add pair with a
mutex-protected check-and-set, closing the TOCTOU window under the ants
worker pool.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- cmd/main.go: use resolved cfg/clusterData values for OTEL resource
  attributes instead of raw os.Getenv (NodeName, PodName, NamespaceName,
  ClusterName)
- setup.go: detect URL-scheme endpoints and route to WithEndpointURL;
  host:port paths continue using WithEndpoint for all three exporters
- setup.go: shutdown closure derives fresh context.Background() timeout
  so provider flush is not short-circuited by an already-cancelled caller
  context
- otelsetup_test.go: restore slowEvalThresholdNs global in t.Cleanup to
  prevent order-dependent test failures
- container_watcher.go: call Release() on dropped events in the
  worker_channel_full path to prevent resource leak

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
The shared resource was missing K8SClusterName and ServiceVersion
attributes, so all three signals were losing cluster identity and
agent version even after cmd/main.go was updated to resolve these
values from clusterData and ProviderConfig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
resource.Default() in otel v1.43.0 sets schema URL 1.40.0 internally,
while resource.NewWithAttributes(semconv.SchemaURL, ...) sets 1.26.0 from
our semconv/v1.26.0 import. resource.Merge rejects two non-empty
conflicting schema URLs, causing InitProviders to fail at startup.

Switch to resource.NewSchemaless so our custom attributes carry no schema
URL; Merge then adopts the default resource's schema without conflict.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The OTLP gRPC exporters default to TLS, which fails against plaintext
collectors (e.g. local SignOz on port 4317). Add WithInsecure() for all
three exporters when the endpoint does not start with https://.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing buffer severity gate

- ProfileLifecycleTracker: one container.profile.learning span per container with LearningSpanID() and LearningTraceparent() for W3C propagation
- OnEntrySaved: emit container.profile.cp.saved child spans with M2 throttle (count==1, %10, or hasDropped)
- ContainerProfile annotations: OtelSpanIDMetadataKey + OtelTraceparentMetadataKey (k8s-interface v0.0.213)
- otelsetup: thin wrapper delegating to go-logger/otelsetup (v0.0.29) which includes ring buffer with severity≥Info gate
- Bump k8s-interface v0.0.212→v0.0.213 for OtelTraceparentMetadataKey

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…amAV, drop events

Wires OTEL log context on the call sites the design doc marks as Tier 1
(direct customer impact) and Tier 2 (operational health):

- Alert delivery failures: alert_manager.go SendRuleAlert/SendMalwareAlert,
  alert_bulk_manager.go bulk-send max retries / queue-full / drain timeout,
  http_exporter.go SendRuleAlert / SendFimAlerts / alert-limit
- Rule eval failures: rule_manager.go ReportEnrichedEvent / EvaluatePolicyRules /
  getUniqueIdAndMessage — use rm.ctx for trace correlation
- ClamAV health: clamav.go ping failure, exec.go/open.go scan failures
- Drop events: container_watcher.go worker-channel-full lines (Tier 2)
- Profile save failures: containerprofilemanager monitoring.go — use cpm.ctx

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… to exec malware path

- rule_manager.go: set span.SetStatus(codes.Error) when CEL evaluation returns an
  error so failed evaluations are distinguishable from successful non-alerts in traces
- malware_manager.go: mirror reportFileOpen OTEL emission in reportFileExec — adds
  metrics.ReportRuleAlert + EmitAlertLogRecord to the exec malware detection path so
  exec-path detections appear in telemetry alongside open-path ones

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@matthyx matthyx force-pushed the feat/otel-instrumentation-phase1 branch from 4525e89 to b41f213 Compare May 21, 2026 19:21
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.225 0.217 -3.7%
Peak CPU (cores) 0.234 0.223 -4.5%
Avg Memory (MiB) 336.682 269.727 -19.9%
Peak Memory (MiB) 340.227 276.770 -18.7%
Dedup Effectiveness (AFTER only)
Event Type Passed Deduped Ratio
capabilities 1 0 0.0%
hardlink 6000 0 0.0%
http 1764 119396 98.5%
network 901 77999 98.9%
open 35620 620629 94.6%
symlink 6000 0 0.0%
syscall 979 1885 65.8%
Event Counters
Metric BEFORE AFTER
capability_counter 9 9
dns_counter 1427 1422
exec_counter 7180 7116
network_counter 94353 93617
open_counter 785912 779420
syscall_counter 3527 3454

- Add pkg/metricsmanager/otel/ with full MetricsManager implementation
  backed by OTEL SDK; attribute-set caching on all hot paths eliminates
  per-call allocations (2× faster, 10× less memory vs Prometheus on the
  histogram path per A/B benchmark)
- Wire OTEL metrics in cmd/main.go; drop prometheus package import
- Add Prometheus scrape mode: OTEL_METRICS_EXPORTER=prometheus installs
  an OTEL→Prometheus bridge and starts :8080/metrics listener
- Standardise rule.ID at all metric call sites (was rule.Name); malware
  alerts use constant "malware" to bound cardinality
- Add docs/metrics-migration.md mapping every old Prometheus name to its
  new OTEL name (breaking rename — dashboards must be updated)
- Add A/B benchmarks in otel/ and prometheus/ packages; hard gate passes:
  OTEL allocs/op ≤ Prometheus allocs/op, ns/op ≤ 1.1× Prometheus

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
matthyx and others added 2 commits May 22, 2026 08:51
Starts go.opentelemetry.io/contrib/instrumentation/runtime in the sidecar
so heap_alloc, GC pause, and goroutine counts flow to SigNoz as OTEL metrics
sampled every 30s.

Records heap.alloc.before_mb / after_mb / delta_mb as attributes on every
sbom.scan span so OOM-prone images are directly identifiable in trace views.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…vability

Adds node_agent.alert.suppressed.total{rule_id, reason} counter so
operators can debug "alert didn't fire" cases from SigNoz alone.

Suppression reasons instrumented:
- no_rules_for_pod: no rule bindings match the pod
- profile_incomplete: rule requires profile but none exists yet
- policy: rule policy check suppressed the alert
- eval_error: CEL evaluation returned an error
- cooldown: alert deduplicated within the cooldown window
- rate_limit: HTTP exporter rate limit reached (rule + malware alerts)

MetricsManager is now threaded into HTTPExporter (variadic, defaults to
noop for backward-compatible test callers) and InitExporters gains a
required MetricsManager parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.226 0.222 -1.8%
Peak CPU (cores) 0.244 0.234 -3.9%
Avg Memory (MiB) 344.741 270.924 -21.4%
Peak Memory (MiB) 348.930 275.879 -20.9%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 12 0
dns_counter 1450 0
exec_counter 7290 0
network_counter 95807 0
open_counter 799323 0
syscall_counter 3562 0

matthyx and others added 2 commits May 22, 2026 09:32
Health checks fire every 5s (scannerReadinessCheckInterval), producing
720 spans/hr — ~125 KB/hr compressed, which blows the entire per-agent
span budget (target: ~1 KB/hr for control-plane spans with mitigations).

Add otelgrpc.WithFilter on both client and server handlers to skip the
Health RPC. CreateSBOM spans (low frequency, ~5–20/hr) are retained.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Avoids ~2-3 KB/hr of metric volume for deployments without a telemetry
endpoint configured. Consistent with how the main agent guards OTEL init.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@matthyx
Copy link
Copy Markdown
Contributor Author

matthyx commented May 22, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

ReportAlertSuppressed is called on the hot path (no_rules_for_pod fires
per event, profile_incomplete fires per-rule per event during learning).
metric.WithAttributes allocates a []attribute.KeyValue slice on every
call; replace with the same sync.Map option cache used by all other
hot-path methods to eliminate per-call heap allocation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
pkg/malwaremanager/v1/malware_manager.go (1)

173-191: ⚡ Quick win

Extract shared malware alert telemetry emission into one helper.

The span + EmitAlertLogRecord block is duplicated in two paths; consolidating it will prevent drift in attributes/fields over time.

♻️ Suggested refactor
 func (mm *MalwareManager) reportFileExec(event utils.ExecEvent) {
@@
-			alertCtx, alertSpan := otelsetup.Tracer().Start(context.Background(), "malware.alert",
-				trace.WithAttributes(
-					attribute.String("container.id", containerID),
-					attribute.String("k8s.namespace.name", result.GetRuntimeAlertK8sDetails().Namespace),
-					attribute.String("k8s.pod.name", result.GetRuntimeAlertK8sDetails().PodName),
-					attribute.String("malware.signature", result.GetBasicRuntimeAlert().AlertName),
-				))
-			otelsetup.EmitAlertLogRecord(alertCtx, otelsetup.AlertLogAttrs{
-				RuleID:           "malware",
-				AlertType:        result.GetBasicRuntimeAlert().AlertName,
-				ContainerID:      containerID,
-				ContainerName:    result.GetRuntimeAlertK8sDetails().ContainerName,
-				Namespace:        result.GetRuntimeAlertK8sDetails().Namespace,
-				PodName:          result.GetRuntimeAlertK8sDetails().PodName,
-				Image:            result.GetRuntimeAlertK8sDetails().Image,
-				EventType:        "malware",
-				MalwareSignature: result.GetBasicRuntimeAlert().AlertName,
-			})
-			alertSpan.End()
+			mm.emitMalwareTelemetry(containerID, result)
@@
 func (mm *MalwareManager) reportFileOpen(event utils.OpenEvent) {
@@
-			alertCtx, alertSpan := otelsetup.Tracer().Start(context.Background(), "malware.alert",
-				trace.WithAttributes(
-					attribute.String("container.id", containerID),
-					attribute.String("k8s.namespace.name", result.GetRuntimeAlertK8sDetails().Namespace),
-					attribute.String("k8s.pod.name", result.GetRuntimeAlertK8sDetails().PodName),
-					attribute.String("malware.signature", result.GetBasicRuntimeAlert().AlertName),
-				))
-			otelsetup.EmitAlertLogRecord(alertCtx, otelsetup.AlertLogAttrs{
-				RuleID:           "malware",
-				AlertType:        result.GetBasicRuntimeAlert().AlertName,
-				ContainerID:      containerID,
-				ContainerName:    result.GetRuntimeAlertK8sDetails().ContainerName,
-				Namespace:        result.GetRuntimeAlertK8sDetails().Namespace,
-				PodName:          result.GetRuntimeAlertK8sDetails().PodName,
-				Image:            result.GetRuntimeAlertK8sDetails().Image,
-				EventType:        "malware",
-				MalwareSignature: result.GetBasicRuntimeAlert().AlertName,
-			})
-			alertSpan.End()
+			mm.emitMalwareTelemetry(containerID, result)
 		}
 	}
 }
+
+func (mm *MalwareManager) emitMalwareTelemetry(containerID string, result malwaremanager.MalwareResult) {
+	alertCtx, alertSpan := otelsetup.Tracer().Start(context.Background(), "malware.alert",
+		trace.WithAttributes(
+			attribute.String("container.id", containerID),
+			attribute.String("k8s.namespace.name", result.GetRuntimeAlertK8sDetails().Namespace),
+			attribute.String("k8s.pod.name", result.GetRuntimeAlertK8sDetails().PodName),
+			attribute.String("malware.signature", result.GetBasicRuntimeAlert().AlertName),
+		))
+	defer alertSpan.End()
+
+	otelsetup.EmitAlertLogRecord(alertCtx, otelsetup.AlertLogAttrs{
+		RuleID:           "malware",
+		AlertType:        result.GetBasicRuntimeAlert().AlertName,
+		ContainerID:      containerID,
+		ContainerName:    result.GetRuntimeAlertK8sDetails().ContainerName,
+		Namespace:        result.GetRuntimeAlertK8sDetails().Namespace,
+		PodName:          result.GetRuntimeAlertK8sDetails().PodName,
+		Image:            result.GetRuntimeAlertK8sDetails().Image,
+		EventType:        "malware",
+		MalwareSignature: result.GetBasicRuntimeAlert().AlertName,
+	})
+}

Also applies to: 237-255

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/malwaremanager/v1/malware_manager.go` around lines 173 - 191, Duplicate
telemetry emission (span + otelsetup.EmitAlertLogRecord) for malware alerts
should be extracted into a single helper to avoid drift; create a function
(e.g., emitMalwareAlertTelemetry or EmitMalwareAlert) that accepts context,
containerID and the alert result (or the derived values from
result.GetBasicRuntimeAlert() and result.GetRuntimeAlertK8sDetails()), starts
the otel tracer span named "malware.alert" with the same attributes, calls
otelsetup.EmitAlertLogRecord with the same AlertLogAttrs (RuleID "malware",
AlertType/MalwareSignature from result.GetBasicRuntimeAlert().AlertName,
ContainerName/Namespace/PodName/Image from result.GetRuntimeAlertK8sDetails(),
EventType "malware"), ends the span, and replace the duplicated blocks (the
block using alertCtx/alertSpan and otelsetup.EmitAlertLogRecord at lines
~173-191 and ~237-255) with calls to that helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/sbom-scanner/main.go`:
- Around line 49-55: The current gating uses only OTEL_EXPORTER_OTLP_ENDPOINT
before calling goruntime.Start, which misses other valid metrics configurations;
update the condition around goruntime.Start (the block that calls
goruntime.Start with goruntime.WithMinimumReadMemStatsInterval) to also permit
startup when OTEL_EXPORTER_OTLP_METRICS_ENDPOINT is set or when
OTEL_METRICS_EXPORTER equals "prometheus" (i.e., check
os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT") != "" ||
os.Getenv("OTEL_EXPORTER_OTLP_METRICS_ENDPOINT") != "" ||
strings.EqualFold(os.Getenv("OTEL_METRICS_EXPORTER"), "prometheus")), keeping
the existing error handling that logs via logger.L().Warning if goruntime.Start
returns an error.

In `@pkg/metricsmanager/prometheus/prometheus.go`:
- Around line 361-383: Destroy() currently unregisters legacy metrics but misses
the metrics created with promauto: p.sbomScanCounter, p.alertSuppressedCounter,
p.sbomScanDuration, p.sbomRestarts, and p.sbomReady; update the
PrometheusMetric.Destroy() implementation to call prometheus.Unregister(...) for
each of those identifiers (ensuring you pass the same metric variables:
sbomScanCounter, alertSuppressedCounter, sbomScanDuration, sbomRestarts,
sbomReady) so they are removed from the default registry during teardown and
avoid stale/default-registry leftovers.

In `@pkg/otelsetup/setup.go`:
- Around line 144-147: The returned shutdown closure currently ignores the error
from srv.Shutdown(ctx); update the closure returned by the function so it
captures the error from srv.Shutdown(ctx) and does not discard it: call
srv.Shutdown(ctx), store its error (e.g., srvErr), then call mp.Shutdown(ctx)
and return an appropriate combined result (for example return mpErr if mpErr !=
nil else srvErr, or wrap both errors) instead of always returning only
mp.Shutdown's error; reference the closure that currently calls
srv.Shutdown(ctx) and mp.Shutdown(ctx) and modify its return behavior
accordingly.

In `@pkg/sbommanager/v1/sbom_manager.go`:
- Around line 100-101: Create a nil-check for the incoming metrics parameter in
CreateSbomManager and any other SbomManager constructors/initializers that store
it (e.g., where metrics is assigned inside the function), and if metrics == nil
replace it with a no-op implementation that satisfies
metricsmanager.MetricsManager (or return an explicit error if you prefer);
ensure the code assigns this safe non-nil instance before storing it on the
SbomManager struct so downstream dereferences cannot panic during scan
processing.

---

Nitpick comments:
In `@pkg/malwaremanager/v1/malware_manager.go`:
- Around line 173-191: Duplicate telemetry emission (span +
otelsetup.EmitAlertLogRecord) for malware alerts should be extracted into a
single helper to avoid drift; create a function (e.g., emitMalwareAlertTelemetry
or EmitMalwareAlert) that accepts context, containerID and the alert result (or
the derived values from result.GetBasicRuntimeAlert() and
result.GetRuntimeAlertK8sDetails()), starts the otel tracer span named
"malware.alert" with the same attributes, calls otelsetup.EmitAlertLogRecord
with the same AlertLogAttrs (RuleID "malware", AlertType/MalwareSignature from
result.GetBasicRuntimeAlert().AlertName, ContainerName/Namespace/PodName/Image
from result.GetRuntimeAlertK8sDetails(), EventType "malware"), ends the span,
and replace the duplicated blocks (the block using alertCtx/alertSpan and
otelsetup.EmitAlertLogRecord at lines ~173-191 and ~237-255) with calls to that
helper.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8738e41f-2174-4fbb-ad9e-b3fc16ead7c0

📥 Commits

Reviewing files that changed from the base of the PR and between 4525e89 and c477e36.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (35)
  • cmd/main.go
  • cmd/sbom-scanner/main.go
  • docs/CONFIGURATION.md
  • docs/metrics-migration.md
  • go.mod
  • pkg/config/config.go
  • pkg/config/config_test.go
  • pkg/containerprofilemanager/v1/containerprofile_manager.go
  • pkg/containerprofilemanager/v1/monitoring.go
  • pkg/containerwatcher/v2/container_watcher.go
  • pkg/containerwatcher/v2/event_handler_factory.go
  • pkg/containerwatcher/v2/tracers/top.go
  • pkg/exporters/alert_bulk_manager.go
  • pkg/exporters/alert_manager.go
  • pkg/exporters/exporters_bus.go
  • pkg/exporters/http_exporter.go
  • pkg/malwaremanager/v1/clamav/clamav.go
  • pkg/malwaremanager/v1/clamav/exec.go
  • pkg/malwaremanager/v1/clamav/open.go
  • pkg/malwaremanager/v1/malware_manager.go
  • pkg/metricsmanager/metrics_manager_interface.go
  • pkg/metricsmanager/metrics_manager_mock.go
  • pkg/metricsmanager/metrics_manager_noop.go
  • pkg/metricsmanager/otel/bench_test.go
  • pkg/metricsmanager/otel/otel_metrics_manager.go
  • pkg/metricsmanager/prometheus/bench_test.go
  • pkg/metricsmanager/prometheus/prometheus.go
  • pkg/otelsetup/lifecycle.go
  • pkg/otelsetup/otelsetup_test.go
  • pkg/otelsetup/setup.go
  • pkg/rulemanager/rule_manager.go
  • pkg/sbommanager/v1/metrics.go
  • pkg/sbommanager/v1/sbom_manager.go
  • pkg/sbomscanner/v1/client.go
  • pkg/sbomscanner/v1/server.go
💤 Files with no reviewable changes (1)
  • pkg/sbommanager/v1/metrics.go
✅ Files skipped from review due to trivial changes (5)
  • pkg/malwaremanager/v1/clamav/open.go
  • pkg/config/config_test.go
  • docs/CONFIGURATION.md
  • docs/metrics-migration.md
  • pkg/exporters/alert_bulk_manager.go

Comment thread cmd/sbom-scanner/main.go Outdated
Comment thread pkg/metricsmanager/prometheus/prometheus.go
Comment thread pkg/otelsetup/setup.go
Comment thread pkg/sbommanager/v1/sbom_manager.go
…etheus port conflict

Issue 1: sidecar OTEL was a no-op in ARMO deployments because credentials
come from /etc/credentials (not env vars). Load from file first, fall back
to ACCOUNT_ID/ACCESS_KEY env vars for non-ARMO deployments.

Issue 2: InitProviders binds :8080 in Prometheus mode; main agent already
owns that port, causing initPrometheusMeterProvider to return an error and
tear down all base providers (traces + logs lost too). Changed to a soft
failure: log a warning and continue with traces/logs when the Prometheus
listener can't bind. Sidecar retains full OTLP telemetry in all modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

Picks up buildAuthHeaders refactor and AC8/AC9 unit tests for ARMO auth
header injection (X-API-Key, X-Customer-GUID) across trace/log/metric
exporter option builders.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.204 0.203 -0.9%
Peak CPU (cores) 0.215 0.209 -2.7%
Avg Memory (MiB) 321.076 265.948 -17.2%
Peak Memory (MiB) 323.840 270.332 -16.5%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 8 0
dns_counter 1463 0
exec_counter 7322 0
network_counter 96293 0
open_counter 802807 0
syscall_counter 3704 0

matthyx and others added 3 commits May 22, 2026 11:06
otel.SetMeterProvider(mp) was called before net.Listen(":8080"), so a
port-conflict error (sidecar when main agent owns :8080) left a leaked
Prometheus-backed provider as the global, silently blackholing sidecar
metrics. Reorder to attempt net.Listen first; only install the provider
after the listener succeeds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…reateSbomManager

Prevents a runtime panic if CreateSbomManager is ever called with a nil
MetricsManager (e.g. in tests). Substitutes the no-op implementation
rather than returning an error, consistent with how other managers in
this codebase treat optional metric dependencies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e values

HeapAlloc is a live snapshot that can decrease when GC runs mid-scan,
making heap.alloc.delta_mb negative or understated. Switch to TotalAlloc
(monotonically increasing cumulative bytes) so the delta always reflects
actual allocations made during the scan. Rename span attributes from
heap.alloc.* to alloc.total.* to avoid implying a live-heap measurement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@matthyx
Copy link
Copy Markdown
Contributor Author

matthyx commented May 22, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.204 0.196 -3.8%
Peak CPU (cores) 0.213 0.204 -4.1%
Avg Memory (MiB) 312.036 270.980 -13.2%
Peak Memory (MiB) 313.836 279.043 -11.1%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 11 0
dns_counter 1429 0
exec_counter 7147 0
network_counter 94013 0
open_counter 782194 0
syscall_counter 3542 0

matthyx and others added 3 commits May 22, 2026 12:03
Picks up credential-presence auth header gate — drops isARMOEndpoint()
hostname matching in favour of accessKey != "" check, consistent with
the SBOM scan-failure reporter and HTTP exporter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tial-presence model

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…LE_DEBUG_LISTENER

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

Picks up debug listener activation from KS_LOGGER_LEVEL=debug directly
in the library — no ENABLE_DEBUG_LISTENER env var or DebugListener config
field needed in callers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.216 0.220 +1.6%
Peak CPU (cores) 0.228 0.230 +1.0%
Avg Memory (MiB) 330.949 265.483 -19.8%
Peak Memory (MiB) 333.180 270.480 -18.8%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 9 0
dns_counter 1453 0
exec_counter 7293 0
network_counter 95876 0
open_counter 800201 0
syscall_counter 3668 0

- Add KS_LOGGER_LEVEL and KS_LOGGER_NAME to CONFIGURATION.md env vars table
- Gate runtime metrics on all metric-exporter env vars (OTEL_METRICS_EXPORTER,
  OTEL_EXPORTER_OTLP_METRICS_ENDPOINT) not just the base endpoint
- Unregister sbom/alert counters in PrometheusMetric.Destroy()
- Propagate prometheus HTTP server shutdown error via errors.Join

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.195 0.191 -2.4%
Peak CPU (cores) 0.203 0.198 -2.1%
Avg Memory (MiB) 330.452 265.105 -19.8%
Peak Memory (MiB) 332.152 267.922 -19.3%
Dedup Effectiveness

No data available.

Event Counters
Metric BEFORE AFTER
capability_counter 10 0
dns_counter 1427 0
exec_counter 7194 0
network_counter 94499 0
open_counter 788400 0
syscall_counter 3544 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant