Skip to content

Add worker observability metrics#62

Open
ianhodge wants to merge 2 commits intomainfrom
ih/worker-observability-metrics
Open

Add worker observability metrics#62
ianhodge wants to merge 2 commits intomainfrom
ih/worker-observability-metrics

Conversation

@ianhodge
Copy link
Copy Markdown
Member

@ianhodge ianhodge commented May 1, 2026

Summary

  • add bounded OpenTelemetry metrics for task assignment age, startup latency, failure classification, backend operations, cleanup failures, cancellations, and cancelled task results
  • emit opt-in per-task traces and lifecycle events when OTEL_TRACES_EXPORTER is configured
  • classify Docker, Kubernetes, and direct backend failures with stable phase and reason labels
  • update README monitoring docs with tracing setup, the metric catalog, and sample dashboard queries

Validation

  • go test ./internal/metrics ./internal/worker
  • go test ./...
  • helm template oz-agent-worker charts/oz-agent-worker --set image.tag=test --set worker.workerId=test-worker --set warp.apiKeySecret.create=true --set warp.apiKeySecret.value=dummy --set metrics.enabled=true
  • git diff --check

Co-Authored-By: Oz oz-agent@warp.dev

Copy link
Copy Markdown
Member Author

ianhodge commented May 1, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

Co-Authored-By: Oz <oz-agent@warp.dev>
@ianhodge ianhodge force-pushed the ih/worker-observability-metrics branch from d76db19 to 7b988b5 Compare May 1, 2026 22:40
@ianhodge ianhodge changed the title Add worker observability metrics Co-Authored-By: Oz <oz-agent@warp.dev> Add worker observability metrics May 1, 2026
@ianhodge ianhodge changed the title Add worker observability metrics Add worker observability metrics May 1, 2026
@ianhodge
Copy link
Copy Markdown
Member Author

ianhodge commented May 1, 2026

@ianhodge ianhodge requested review from a team and bnavetta May 1, 2026 23:05
Copy link
Copy Markdown
Collaborator

@bnavetta bnavetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

connected metric.Int64Gauge
tasksActive metric.Int64UpDownCounter
tasksMaxConcurrent metric.Int64Gauge
tasksClaimed metric.Int64Counter
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seemed useful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants