fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867
Merged
junaway merged 6 commits intorelease/v0.87.2from Mar 2, 2026
Merged
fix(observability): replace O(N*sqrt(N)) histogram join with O(N) width_bucket#3867junaway merged 6 commits intorelease/v0.87.2from
junaway merged 6 commits intorelease/v0.87.2from
Conversation
[docs] Update API reference docs
…(N) width_bucket The analytics histogram query (build_numeric_continuous_blocks) used generate_series to create bin intervals, then joined every data row against every bin using range predicates. For large datasets this caused O(N * sqrt(N)) comparisons — 332K rows with 577 bins produced 191M comparisons, exceeding the 15-second statement timeout. Replace with PostgreSQL's width_bucket(value, lo, hi, bins) which computes the bin index in O(1) per row via arithmetic. The count step becomes a simple GROUP BY. generate_series is still used for the output (to include empty bins) but only for the small bin-series-to-counts left join. Also handles the edge case where all values are identical (vmin == vmax) by using GREATEST(edge_width * 1e-9, 1e-9) as the epsilon, ensuring width_bucket always receives lo < hi. Before/after on 964K rows (30 days): - Before: >15s (statement timeout) - After: ~2.6s
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
- Reuse edge_width/center_width in step 9 instead of recomputing - Wrap chosen_bins in GREATEST(..., 1) to prevent division-by-zero if a user passes bins=0 via MetricSpec (pre-existing gap)
Contributor
Railway Preview Environment
Updated at 2026-03-02T08:54:24.688Z |
…n-widening Epsilon-widening wb_hi shifted all internal bin boundaries, causing values at exact edges (common with integer metrics like token counts) to land in the previous bin instead of the next. Fix: only add epsilon when vmin == vmax (degenerate case where width_bucket requires lo < hi), and clamp the overflow bucket (bins+1) to the last bin via LEAST. This preserves the original [start, end) half-open interval semantics.
junaway
approved these changes
Mar 2, 2026
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
build_numeric_continuous_blocks()with PostgreSQL'swidth_bucket()function, reducing complexity from O(N × √N) to O(N)vmin == vmax) that would crashwidth_bucketwith "lower bound cannot equal upper bound"Problem
The analytics histogram query used
generate_seriesto create bin intervals, then joined every data row against every bin using range predicates (value >= start AND value < end). For large datasets this caused O(N × √N) comparisons — e.g., 332K rows with 577 bins produced 191M comparisons, consistently exceeding the 15-second statement timeout for any query spanning 30+ days.Solution
width_bucket(value, lo, hi, bins)computes the bin index in O(1) per row via arithmetic. The count step becomes a simpleGROUP BY.generate_seriesis still used for the output (to include empty bins), but only for the small bin-series-to-counts left join.The
vmin == vmaxedge case (all values identical) is handled by usingGREATEST(edge_width * 1e-9, 1e-9)as the epsilon floor, ensuringwidth_bucketalways receiveslo < hi.Results (964K rows, 30-day window)
Verification
normalize_hist()unchangedvmin == vmaxedge case tested: 50 identical values correctly land in 1 binInvalidArgumentForWidthBucketFunctionErrorwithout the epsilon fixChanged files
api/oss/src/dbs/postgres/tracing/utils.py— Steps 7-9 ofbuild_numeric_continuous_blocks()