Skip to content

feat: add Browser RUM dashboard template#2413

Open
teeohhem wants to merge 9 commits into
mainfrom
teeohhem/hackathon-ideas
Open

feat: add Browser RUM dashboard template#2413
teeohhem wants to merge 9 commits into
mainfrom
teeohhem/hackathon-ideas

Conversation

@teeohhem
Copy link
Copy Markdown
Contributor

@teeohhem teeohhem commented Jun 3, 2026

Summary

Adds a Browser RUM template to the dashboards gallery (Dashboards → Templates) for browser sessions instrumented with the HyperDX Browser SDK — or any OpenTelemetry browser instrumentation that emits a rum.sessionId resource attribute. It fills a gap: HyperDX ships a browser SDK but had no out-of-the-box RUM dashboard. The template is purely declarative JSON validated by the existing dashboardTemplates schema test; the only code change is registering it in dashboardTemplates/index.ts, plus a changeset.

The dashboard is organized into three sections: Performance Overview (page-view/session/error KPIs, Core Web Vitals LCP/INP/CLS p75, median/p75/p90 page-load percentiles, long tasks), Page Views Breakdown (traffic by URL, browser, country, and device size derived from screen.xy), and a tabbed Errors section (overview, JS exceptions by message and by page, failing API calls). It also defines dive dashboard-level filters: Service, Environment, Service Version, Page URL, and Country.

Screenshots or video

image image image
Tab-1780519086079.webm

How to test locally or on Vercel

  1. Dashboards → Templates → Browser RUM → Import, then map each tile/filter to your Traces source (auto-maps if a source named "Traces" exists).
  2. Point a browser app instrumented with @hyperdx/browser at your collector (or seed webvitals / documentLoad / fetch+xhr / exception spans carrying rum.sessionId). Verify the KPIs, Core Web Vitals, breakdown tables, and Errors tabs populate, and that the six filters apply.
  3. Top Browsers / Top Countries tiles and the Browser / Country filters only populate when the collector's useragent and geoip processors are enabled (noted in the tile titles + dashboard description).

References

Adds a "Browser RUM" template to the dashboards gallery for browser
sessions instrumented with the HyperDX Browser SDK (or any OTel
browser instrumentation emitting a rum.sessionId resource attribute):

- Performance Overview: page-view/session/error KPIs, Core Web Vitals
  (LCP/INP/CLS) p75, median/p75/p90 page-load percentiles, long tasks
- Page Views Breakdown: traffic by URL, browser, country, device size
  (derived from screen.xy)
- Errors section with tabs (overview, JS exceptions by message and by
  page, failing API calls)
- Six dashboard filters: Service, Environment, Service Version, Page
  URL, Browser, Country

Top Browsers / Top Countries tiles and the Browser/Country filters
populate when the collector's useragent and geoip processors are on.
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 3, 2026

🦋 Changeset detected

Latest commit: 0f640f0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/app Minor
@hyperdx/api Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 5, 2026 4:14pm
hyperdx-storybook Ready Ready Preview, Comment Jun 5, 2026 4:14pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Large diff: 1026 production lines changed (threshold: 1000)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 2
  • Production lines changed: 1026
  • Branch: teeohhem/hackathon-ideas
  • Author: teeohhem

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

Deep Review

✅ No critical issues found.

This PR is a declarative dashboard template (browser-rum.json), a one-line registration in index.ts, and a changeset. The existing schema test (dashboardTemplates.test.ts) already covers structural validity and description presence. No P0/P1 issues. One headline finding from the correctness pass — that StatusCode = 'STATUS_CODE_ERROR' never matches stored data — was verified as a false positive and dropped: the stored value is STATUS_CODE_ERROR (per the e2e RUM seed and waterfall.ts), and lucene StatusCode:error renders to ILIKE '%error%', so both forms match the same error rows.

🟡 P2 -- recommended

  • packages/app/src/dashboardTemplates/browser-rum.json:737 -- The "AJAX Errors" time-series (rum-010) counts only StatusCode:error, while the "AJAX Errors" KPI (rum-008, line 703) and "Top Failing API Calls" (rum-013, line 853) also count toUInt16OrZero(SpanAttributes['http.status_code']) >= 400, so a 4xx/5xx fetch whose span status is not error appears in the KPI but is missing from the like-named chart.
    • Fix: Add the http.status_code >= 400 branch to the rum-010 AJAX aggCondition so all three tiles share one AJAX-error definition.
    • correctness, maintainability
  • packages/app/src/__tests__/dashboardTemplates.test.ts:16 -- The template test only runs DashboardTemplateSchema.safeParse, which does not validate tile→container/tab references (that check lives only in the external-API buildDashboardBodySchema), so a typo in a containerId/tabId in this tab-heavy template would pass CI and break only at render.
    • Fix: Add a per-template assertion that every tile.containerId resolves to a declared container and every tabId resolves to a tab within that container.
    • testing
  • .changeset/browser-rum-dashboard-template.md:2 -- The changeset declares a minor bump for @hyperdx/app, but the fixed group in .changeset/config.json (@hyperdx/api, @hyperdx/app, @hyperdx/otel-collector) propagates that minor bump to two packages this PR does not touch.
    • Fix: Use patch for an additive template-only change to keep the fixed-group bump at patch level.
    • project-standards
🔵 P3 nitpicks (5)
  • packages/app/src/dashboardTemplates/browser-rum.json:679 -- The "JS Errors" KPI (rum-007) is scoped to ResourceAttributes.rum.sessionId:*, but the "JS Errors" series in rum-010 (line 730) is not, so the two can diverge if non-RUM spans match.
    • Fix: Add ResourceAttributes.rum.sessionId:* to the rum-010 JS aggCondition to match the KPI.
  • packages/app/src/dashboardTemplates/browser-rum.json:99 -- Several page-view/page-load/long-task tiles (rum-006, rum-017, rum-018, rum-020, rum-021, rum-024, rum-014, rum-011) filter only on SpanName, unlike the session-scoped tiles, so non-RUM spans sharing those names would be included.
    • Fix: Prepend ResourceAttributes.rum.sessionId:* to these where clauses for consistency with the rest of the dashboard.
  • packages/app/src/dashboardTemplates/browser-rum.json:86 -- Tile ids use a rum-NNN numbering that is gapped (004, 009, 022, 023, 025 absent) and out of visual order, implying a sequence that does not exist; sibling templates use opaque ids in render order.
    • Fix: Renumber contiguously in render order or switch to opaque/semantic ids.
  • packages/app/src/dashboardTemplates/browser-rum.json:454 -- The URL coalesce(nullif(...)) grouping expression is repeated verbatim across rum-014, rum-011, and rum-015 with no guard against drift if the attribute fallback list changes.
    • Fix: Add a test asserting these grouping expressions stay identical, or accept the duplication as inherent to the format.
  • packages/app/src/dashboardTemplates/browser-rum.json:616 -- "Top Errored Sessions" (rum-016) groups by per-session id and filters errors via a post-aggregation having Errors > 0, aggregating all sessions before trimming.
    • Fix: Pre-filter error rows in the where clause so only errored sessions enter the aggregation.

Reviewers (6): correctness, maintainability, testing, project-standards, performance, learnings-researcher.

Testing gaps:

  • No test verifies that like-named KPI and time-series tiles (Page Views, JS Errors, AJAX Errors) compute equivalent sets — the AJAX divergence above is not caught by CI.
  • No test exercises orphaned containerId/tabId references for templates.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

E2E Test Results

All tests passed • 197 passed • 3 skipped • 1299s

Status Count
✅ Passed 197
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Trim the dashboard description to a single sentence to match the
length and style of the existing runtime-metrics templates.
The browser is already captured out of the box: the OTel
document-load instrumentation sets http.user_agent (navigator.userAgent)
on documentLoad spans. The template was instead grouping on
user_agent.name / user_agent.original, which require collector-side
enrichment that isn't present by default, so Top Browsers came up empty
against real data.

- Top Browsers now parses the browser name from SpanAttributes
  ['http.user_agent'] in SQL (Edge/Opera/Firefox/Chrome/Safari/Other),
  scoped to spans carrying the UA. Works with no SDK or collector change.
- Removed the dashboard-level Browser filter: http.user_agent only
  exists on documentLoad spans, so a cross-tile filter keyed on it would
  zero out every non-documentLoad tile. It can return once the UA is
  promoted to a resource attribute (present on every span).

Country tile/filter still depend on the collector geoip processor, since
the browser cannot determine the user's country.
The chart builder editor only renders a WHERE input bound to the
per-series aggCondition (ChartSeriesEditor); the top-level `where`
input renders solely for Search-type tiles (ChartEditorControls.tsx:148
vs :334). So builder tiles that stored their filter in top-level `where`
showed an empty WHERE box even though the filter applied correctly in
SQL (renderChartConfig reads config.where directly). This affected
nearly every tile, not just Page Views; the earlier OR-vs-AND theory
was a red herring.

Move each tile's filter from top-level `where` into the aggCondition of
every select (clearing `where`). renderChartConfig promotes an
all-selects aggCondition back into a real WHERE clause
(renderChartConfig.ts:944,1019), so for a single shared condition the
rendered query is result-identical (count() WHERE c == countIf(c)
WHERE c, etc.) while the condition now shows in the editor.

Left unchanged: Errors over Time and Top Errored Sessions, which already
use per-series aggConditions (their meaningful conditions already
display; their top-level where is only the broad rum.sessionId scope).

Verified: dashboardTemplates schema test + app ci:lint pass; SQL
result-equivalence confirmed by reading renderChartConfig's aggCondition
promotion. Live editor click-through deferred (dev stack down).
Wire up the table onClick row-action (SavedChartConfig.onClick, type
'search') on the tables whose grouped value reverses cleanly into a
search filter:

- Top Errored Sessions -> opens the session's spans
  (rum.sessionId:"{{Session}}") — the client-side tracing drilldown
- Top URLs / Slowest Pages -> page views / doc loads for that URL
- Errors per Page -> errors for that URL
- Top JS Errors -> spans for that exception message

Each targets the Traces source by name ({ mode: 'id', id: 'Traces' });
the import flow auto-matches that to the user's mapped source and
rewrites it to the concrete ID (DBDashboardImportPage onClick mapping +
convertToDashboardDocument), so it stays portable. whereTemplate uses
Handlebars row-column variables. Skipped tiles whose group key can't be
reversed (Top Failing API Calls concat, Top Browsers/Countries/Device
derived buckets).
Builder tables without an onClick fall back to buildTableRowSearchUrl,
which derives the drilldown from config.where — now empty (filters moved
to aggCondition), so those drilldowns lost their scope. And the derived
group keys (browser/device/concat) don't reverse into a filter. There's
no template-level way to disable a builder-table row action, so give the
remaining tables a correct onClick instead:

- Top JS Errors: match the coalesced group value across exception.message
  / message / SpanName (it previously only matched exception.message, so
  e.g. an "unhandledrejection" row returned nothing).
- Top Browsers: substring-match the parsed name against http.user_agent.
- Top Countries: exact geo.country.name match.
- Top Failing API Calls: regroup by http.url so the row reverses; drill
  into fetch/xhr calls to that endpoint.
- Top Device Sizes: regroup by raw screen.xy so the row reverses; drill
  into documentLoad spans at that resolution.

Every table now has a working, scoped row action; the scope-less legacy
fallback no longer fires.
@github-actions github-actions Bot removed the review/tier-3 Standard — full human review required label Jun 5, 2026
@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label Jun 5, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 5, 2026

Greptile Summary

This PR adds a new Browser RUM dashboard template to the dashboards gallery, covering Core Web Vitals, page-load percentiles, traffic breakdowns, and an errors section with JS exceptions and AJAX failure tracking. The change is purely declarative — a new JSON file registered in index.ts alongside the existing templates.

  • Performance Overview (KPIs, LCP/INP/CLS p75, page-load percentile chart, long tasks) and Page Views Breakdown (top URLs, browsers, countries, device sizes, slowest pages) sections are well-structured.
  • Errors section introduces a real-world-usable tabbed layout; the AJAX filtering logic uses a ResourceAttributes['rum.sessionId'] != '' SQL guard to scope to browser sessions.
  • The Top Browsers tile derives browser names from UA tokens via multiIf SQL, but the onClick.whereTemplate substitutes back human-readable names (e.g. 'Edge', 'Opera') rather than the original tokens (Edg/, OPR/), causing broken drill-through for those rows.

Confidence Score: 4/5

Safe to merge with a minor fix recommended: the Top Browsers click-through produces empty results for Edge and Opera users.

The template is declarative JSON with no runtime logic changes. The one concrete defect is in the Top Browsers onClick.whereTemplate: it substitutes human-readable display names that do not appear verbatim in modern Edge or Opera user-agent strings, so clicking those rows returns no results. Chrome, Firefox, and Safari click-throughs work correctly. The rest of the dashboard logic and session scoping is sound.

The onClick.whereTemplate for rum-026 (Top Browsers) in browser-rum.json needs the Edge/Opera token mismatch addressed.

Important Files Changed

Filename Overview
packages/app/src/dashboardTemplates/browser-rum.json New 1024-line Browser RUM dashboard template. The onClick.whereTemplate for the Top Browsers tile uses human-readable browser display names that don't appear verbatim in user-agent strings, causing broken click-through for Edge and Opera rows.
packages/app/src/dashboardTemplates/index.ts Registers the new browser-rum template following the same pattern as all other templates. No issues.
.changeset/browser-rum-dashboard-template.md Changeset correctly marked as minor with accurate description of the five dashboard-level filters.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Browser RUM Template] --> B[Performance Overview]
    A --> C[Page Views Breakdown]
    A --> D[Errors Section]
    B --> B1[KPIs: Page Views, Sessions, Load percentiles, Core Web Vitals]
    B --> B2[Page Load Chart median p75 p90]
    B --> B3[Page Views over Time + Long Tasks]
    C --> C1[Top URLs]
    C --> C2[Top Browsers — onClick broken for Edge and Opera]
    C --> C3[Top Countries — requires geoip]
    C --> C4[Top Device Sizes]
    C --> C5[Slowest Pages p75]
    C --> C6[Top Errored Sessions]
    D --> D1[Overview: JS Errors KPI, AJAX Errors KPI, Errors over Time]
    D --> D2[JS Exceptions: by message, by page]
    D --> D3[API Failures: Top Failing API Calls]
    A --> F[5 Filters: Service, Environment, Version, Page URL, Country]
Loading

Fix All in Claude Code Fix All in Conductor Fix All in Cursor Fix All in Codex

Reviews (2): Last reviewed commit: "fix: scope AJAX error tiles to RUM sessi..." | Re-trigger Greptile

Comment thread packages/app/src/dashboardTemplates/browser-rum.json
Comment thread packages/app/src/dashboardTemplates/browser-rum.json
Comment thread .changeset/browser-rum-dashboard-template.md
…ition

Code-review fixes for the Errors section:

1. AJAX Errors KPI (rum-008) and Top Failing API Calls (rum-013) had no
   rum.sessionId guard, so server-side fetch/xhr spans could inflate the
   counts relative to the rest of the dashboard. Add the SQL equivalent
   of the lucene rum.sessionId:* guard the sibling tiles use
   (ResourceAttributes['rum.sessionId'] != '').

2. The AJAX Errors KPI counted status>=400 OR error span status, while
   the "Errors over Time" AJAX series only counted error span status —
   so a 404 with no error status hit the KPI but not the chart. Align
   the chart's AJAX series to the same (more complete) definition so the
   KPI total and the chart line measure the identical event set.
Copy link
Copy Markdown
Contributor

@pulpdrew pulpdrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff, nice to see some newer dashboard features being exercised! Couple of suggestions

Comment on lines +122 to +143
"config": {
"name": "Median Page Load (ms)",
"source": "Traces",
"displayType": "number",
"granularity": "auto",
"alignDateRangeToGranularity": true,
"select": [
{
"aggFn": "quantile",
"level": 0.5,
"valueExpression": "Duration / 1000000",
"aggCondition": "SpanName:\"documentLoad\"",
"aggConditionLanguage": "lucene"
}
],
"where": "",
"whereLanguage": "lucene",
"numberFormat": {
"output": "number",
"mantissa": 0,
"thousandSeparated": true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a change this will be the wrong precision for duration? Same thing elsewhere.

Instead, we could remove the divisor and numberFormat, so that duration format with the correct precision will be inferred:

Suggested change
"config": {
"name": "Median Page Load (ms)",
"source": "Traces",
"displayType": "number",
"granularity": "auto",
"alignDateRangeToGranularity": true,
"select": [
{
"aggFn": "quantile",
"level": 0.5,
"valueExpression": "Duration / 1000000",
"aggCondition": "SpanName:\"documentLoad\"",
"aggConditionLanguage": "lucene"
}
],
"where": "",
"whereLanguage": "lucene",
"numberFormat": {
"output": "number",
"mantissa": 0,
"thousandSeparated": true
}
"config": {
"name": "Median Page Load",
"source": "Traces",
"displayType": "number",
"granularity": "auto",
"alignDateRangeToGranularity": true,
"select": [
{
"aggFn": "quantile",
"level": 0.5,
"valueExpression": "Duration",
"aggCondition": "SpanName:\"documentLoad\"",
"aggConditionLanguage": "lucene"
}
],
"where": "",
"whereLanguage": "lucene"

Comment on lines +184 to +193
"config": {
"name": "LCP p75 (ms)",
"source": "Traces",
"displayType": "number",
"granularity": "auto",
"alignDateRangeToGranularity": true,
"select": [
{
"aggFn": "quantile",
"level": 0.75,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, we don't support p75 through the app, so this renders as an empty aggFn.

Do we need p75 or could we do p95? If p75, maybe we should try a custom aggregation to populate the dropdown correctly.

Sidenote, we should probably add a validation so we don't accept this during import, or expand support to custom quantile levels.

Image

Comment on lines +471 to +476
"groupBy": [
{
"valueExpression": "coalesce(nullif(SpanAttributes['http.url'], ''), nullif(SpanAttributes['page.url'], ''), nullif(SpanAttributes['location.href'], ''))",
"alias": "URL"
}
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably another place where we need validation or transformation during import, but this is unrendered because we don't support this groupBy being an array in dashboards (it should be a string)

Image

"w": 24,
"h": 8,
"config": {
"name": "Top Errored Sessions",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of these tables could probably benefit from setting groupByColumnsOnLeft so they read more naturally

"mode": "id",
"id": "Traces"
},
"whereTemplate": "ResourceAttributes.rum.sessionId:\"{{Session}}\"",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this getting used!

Comment on lines +662 to +676
{
"aggFn": "quantile",
"level": 0.75,
"valueExpression": "Duration / 1000000",
"aggCondition": "SpanName:\"documentLoad\"",
"aggConditionLanguage": "lucene",
"alias": "Page Load p75 (ms)"
},
{
"aggFn": "count",
"valueExpression": "",
"alias": "Views",
"aggCondition": "SpanName:\"documentLoad\"",
"aggConditionLanguage": "lucene"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add per-series numberFormats here to render the p75 as a duration and the count as a number

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants