Skip to content

fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867

Merged
junaway merged 6 commits intorelease/v0.87.2from
fix/histogram-width-bucket
Mar 2, 2026
Merged

fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867
junaway merged 6 commits intorelease/v0.87.2from
fix/histogram-width-bucket

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Mar 1, 2026

Summary

  • Replace the nested-loop histogram join in build_numeric_continuous_blocks() with PostgreSQL's width_bucket() function, reducing complexity from O(N × √N) to O(N)
  • Fix edge case where all values are identical (vmin == vmax) that would crash width_bucket with "lower bound cannot equal upper bound"

Problem

The analytics histogram query used generate_series to create bin intervals, then joined every data row against every bin using range predicates (value >= start AND value < end). For large datasets this caused O(N × √N) comparisons — e.g., 332K rows with 577 bins produced 191M comparisons, consistently exceeding the 15-second statement timeout for any query spanning 30+ days.

Solution

width_bucket(value, lo, hi, bins) computes the bin index in O(1) per row via arithmetic. The count step becomes a simple GROUP BY. generate_series is still used for the output (to include empty bins), but only for the small bin-series-to-counts left join.

The vmin == vmax edge case (all values identical) is handled by using GREATEST(edge_width * 1e-9, 1e-9) as the epsilon floor, ensuring width_bucket always receives lo < hi.

Results (964K rows, 30-day window)

Query Before After
Duration histogram, 7 days 2.97s 0.47s
Duration histogram, 30 days timeout (>15s) 2.6s
Multi-metric (3 specs), 30 days timeout (>15s) 5.1s
Full dashboard (6 specs), 30 days timeout (>15s) 8.1s

Verification

  • Histogram bin count sums match metric count exactly (no rows lost or double-counted)
  • Output format (bin/count/interval structure) preserved — downstream normalize_hist() unchanged
  • vmin == vmax edge case tested: 50 identical values correctly land in 1 bin
  • Confirmed PostgreSQL raises InvalidArgumentForWidthBucketFunctionError without the epsilon fix

Changed files

  • api/oss/src/dbs/postgres/tracing/utils.py — Steps 7-9 of build_numeric_continuous_blocks()

Open with Devin

github-actions bot and others added 3 commits February 28, 2026 06:09
…(N) width_bucket

The analytics histogram query (build_numeric_continuous_blocks) used
generate_series to create bin intervals, then joined every data row
against every bin using range predicates. For large datasets this
caused O(N * sqrt(N)) comparisons — 332K rows with 577 bins produced
191M comparisons, exceeding the 15-second statement timeout.

Replace with PostgreSQL's width_bucket(value, lo, hi, bins) which
computes the bin index in O(1) per row via arithmetic. The count step
becomes a simple GROUP BY. generate_series is still used for the output
(to include empty bins) but only for the small bin-series-to-counts
left join.

Also handles the edge case where all values are identical (vmin == vmax)
by using GREATEST(edge_width * 1e-9, 1e-9) as the epsilon, ensuring
width_bucket always receives lo < hi.

Before/after on 964K rows (30 days):
- Before: >15s (statement timeout)
- After: ~2.6s
@vercel
Copy link

vercel bot commented Mar 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Mar 2, 2026 8:40am

Request Review

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Backend Improvement labels Mar 1, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

@mmabrouk mmabrouk requested a review from jp-agenta March 1, 2026 13:30
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

- Reuse edge_width/center_width in step 9 instead of recomputing
- Wrap chosen_bins in GREATEST(..., 1) to prevent division-by-zero
  if a user passes bins=0 via MetricSpec (pre-existing gap)
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2026

Railway Preview Environment

Preview URL https://gateway-production-2af4.up.railway.app/w
Project agenta-oss-pr-3867
Image tag pr-3867-e45295b
Status Deployed

Updated at 2026-03-02T08:54:24.688Z

devin-ai-integration[bot]

This comment was marked as resolved.

…n-widening

Epsilon-widening wb_hi shifted all internal bin boundaries, causing
values at exact edges (common with integer metrics like token counts)
to land in the previous bin instead of the next.

Fix: only add epsilon when vmin == vmax (degenerate case where
width_bucket requires lo < hi), and clamp the overflow bucket
(bins+1) to the last bin via LEAST. This preserves the original
[start, end) half-open interval semantics.
@junaway junaway changed the base branch from main to release/v0.87.2 March 2, 2026 08:38
@junaway junaway merged commit 82ff22a into release/v0.87.2 Mar 2, 2026
4 of 5 checks passed
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ junaway
✅ mmabrouk
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend Improvement size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants