fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket by mmabrouk · Pull Request #3867 · Agenta-AI/agenta

mmabrouk · 2026-03-01T13:20:18Z

Summary

Replace the nested-loop histogram join in build_numeric_continuous_blocks() with PostgreSQL's width_bucket() function, reducing complexity from O(N × √N) to O(N)
Fix edge case where all values are identical (vmin == vmax) that would crash width_bucket with "lower bound cannot equal upper bound"

Problem

The analytics histogram query used generate_series to create bin intervals, then joined every data row against every bin using range predicates (value >= start AND value < end). For large datasets this caused O(N × √N) comparisons — e.g., 332K rows with 577 bins produced 191M comparisons, consistently exceeding the 15-second statement timeout for any query spanning 30+ days.

Solution

width_bucket(value, lo, hi, bins) computes the bin index in O(1) per row via arithmetic. The count step becomes a simple GROUP BY. generate_series is still used for the output (to include empty bins), but only for the small bin-series-to-counts left join.

The vmin == vmax edge case (all values identical) is handled by using GREATEST(edge_width * 1e-9, 1e-9) as the epsilon floor, ensuring width_bucket always receives lo < hi.

Results (964K rows, 30-day window)

Query	Before	After
Duration histogram, 7 days	2.97s	0.47s
Duration histogram, 30 days	timeout (>15s)	2.6s
Multi-metric (3 specs), 30 days	timeout (>15s)	5.1s
Full dashboard (6 specs), 30 days	timeout (>15s)	8.1s

Verification

Histogram bin count sums match metric count exactly (no rows lost or double-counted)
Output format (bin/count/interval structure) preserved — downstream normalize_hist() unchanged
vmin == vmax edge case tested: 50 identical values correctly land in 1 bin
Confirmed PostgreSQL raises InvalidArgumentForWidthBucketFunctionError without the epsilon fix

Changed files

api/oss/src/dbs/postgres/tracing/utils.py — Steps 7-9 of build_numeric_continuous_blocks()

[docs] Update API reference docs

…(N) width_bucket The analytics histogram query (build_numeric_continuous_blocks) used generate_series to create bin intervals, then joined every data row against every bin using range predicates. For large datasets this caused O(N * sqrt(N)) comparisons — 332K rows with 577 bins produced 191M comparisons, exceeding the 15-second statement timeout. Replace with PostgreSQL's width_bucket(value, lo, hi, bins) which computes the bin index in O(1) per row via arithmetic. The count step becomes a simple GROUP BY. generate_series is still used for the output (to include empty bins) but only for the small bin-series-to-counts left join. Also handles the edge case where all values are identical (vmin == vmax) by using GREATEST(edge_width * 1e-9, 1e-9) as the epsilon, ensuring width_bucket always receives lo < hi. Before/after on 964K rows (30 days): - Before: >15s (statement timeout) - After: ~2.6s

vercel · 2026-03-01T13:20:23Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Mar 2, 2026 8:40am

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

- Reuse edge_width/center_width in step 9 instead of recomputing - Wrap chosen_bins in GREATEST(..., 1) to prevent division-by-zero if a user passes bins=0 via MetricSpec (pre-existing gap)

github-actions · 2026-03-01T13:39:28Z

Railway Preview Environment


Preview URL	https://gateway-production-2af4.up.railway.app/w
Project	`agenta-oss-pr-3867`
Image tag	`pr-3867-e45295b`
Status	Deployed

Updated at 2026-03-02T08:54:24.688Z

…n-widening Epsilon-widening wb_hi shifted all internal bin boundaries, causing values at exact edges (common with integer metrics like token counts) to land in the previous bin instead of the next. Fix: only add epsilon when vmin == vmax (degenerate case where width_bucket requires lo < hi), and clamp the overflow bucket (bins+1) to the last bin via LEAST. This preserves the original [start, end) half-open interval semantics.

CLAassistant · 2026-03-02T08:38:59Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ junaway
✅ mmabrouk
❌ github-actions[bot]
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions bot and others added 3 commits February 28, 2026 06:09

docs(api): update API reference from production OpenAPI spec

0c4265e

Merge pull request #3865 from Agenta-AI/automated/update-api-docs

3a362d0

[docs] Update API reference docs

vercel bot deployed to Preview March 1, 2026 13:21 View deployment

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Backend Improvement labels Mar 1, 2026

devin-ai-integration bot reviewed Mar 1, 2026

View reviewed changes

mmabrouk requested a review from jp-agenta March 1, 2026 13:30

This comment was marked as resolved.

Sign in to view

refactor: deduplicate bin_width formula, guard against bins <= 0

cafbb84

- Reuse edge_width/center_width in step 9 instead of recomputing - Wrap chosen_bins in GREATEST(..., 1) to prevent division-by-zero if a user passes bins=0 via MetricSpec (pre-existing gap)

vercel bot deployed to Preview March 1, 2026 13:35 View deployment

This comment was marked as resolved.

Sign in to view

vercel bot deployed to Preview March 1, 2026 13:50 View deployment

junaway approved these changes Mar 2, 2026

View reviewed changes

junaway changed the base branch from main to release/v0.87.2 March 2, 2026 08:38

Merge branch 'release/v0.87.2' into fix/histogram-width-bucket

df80af8

junaway merged commit 82ff22a into release/v0.87.2 Mar 2, 2026
4 of 5 checks passed

vercel bot deployed to Preview March 2, 2026 08:40 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867

fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867
junaway merged 6 commits intorelease/v0.87.2from
fix/histogram-width-bucket

mmabrouk commented Mar 1, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

vercel bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

CLAassistant commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mmabrouk commented Mar 1, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Results (964K rows, 30-day window)

Verification

Changed files

Uh oh!

vercel bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

CLAassistant commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmabrouk commented Mar 1, 2026 •

edited by devin-ai-integration bot

Loading

vercel bot commented Mar 1, 2026 •

edited

Loading

github-actions bot commented Mar 1, 2026 •

edited

Loading