feat(mcp): add denoise option to hyperdx_search tool by brandon-pereira · Pull Request #2371 · hyperdxio/hyperdx

brandon-pereira · 2026-05-29T15:42:53Z

What

Adds a denoise boolean parameter to the MCP hyperdx_search tool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" checkbox.

Why

When investigating issues via MCP, LLM agents get back raw search results that are often dominated by repetitive log noise. This forces agents to either sift through noisy results or make a separate hyperdx_event_patterns call and then manually filter. The denoise option makes this a single-call workflow.

How it works

When denoise=true:

Runs the normal search query to get result rows
In parallel, samples 10k random events from the same source/time range and counts total events
Mines patterns using the shared Drain algorithm (common-utils)
Identifies "noisy" patterns — those accounting for >10% of sampled events (same threshold as the web app)
Rebuilds a TemplateMiner, matches each search result row against learned patterns
Filters out rows matching noisy patterns
Returns filtered rows plus a denoised metadata block listing removed patterns, original row count, and filtered row count

Graceful degradation: if source/connection/body column can't be resolved, or if pattern sampling fails, the original results are returned unmodified.

Changes

File	Change
`packages/api/src/mcp/tools/query/denoise.ts`	New — Core `denoiseSearchResults()` function
`packages/api/src/mcp/tools/query/search.ts`	Added `denoise` schema param + post-processing logic
`packages/api/src/mcp/tools/query/helpers.ts`	Extracted shared `resolveBodyExpression()` + `SAFE_BODY_EXPR_CHARS`
`packages/api/src/mcp/tools/query/runEventPatterns.ts`	Imports from helpers instead of local definitions

Example response (with denoise=true)

{
  "result": { "data": [...filtered rows...] },
  "denoised": {
    "removedPatterns": [
      { "pattern": "GET /health <*> <*>", "estimatedCount": 45000, "sampleCount": 4500 }
    ],
    "originalRowCount": 50,
    "filteredRowCount": 12
  }
}

Performance

Adds ~1-2s latency when denoise=true due to the pattern sampling queries. No impact when denoise=false (the default).

Closes HDX-4346

Add a `denoise` boolean parameter to the MCP `hyperdx_search` tool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" feature. When `denoise=true`: - Samples 10k random events from the same source/time range - Mines patterns using the shared Drain algorithm (common-utils) - Identifies "noisy" patterns (>10% of sampled events) - Matches each search result row against learned patterns - Filters out rows matching noisy patterns - Returns filtered rows plus metadata listing removed patterns Also extracts `resolveBodyExpression()` and `SAFE_BODY_EXPR_CHARS` from runEventPatterns.ts into helpers.ts for reuse.

changeset-bot · 2026-05-29T15:42:58Z

🦋 Changeset detected

Latest commit: 10b0748

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages

Name	Type
@hyperdx/api	Patch
@hyperdx/app	Patch
@hyperdx/common-utils	Patch
@hyperdx/otel-collector	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-05-29T15:42:59Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	Jun 3, 2026 10:43pm
hyperdx-storybook	Ready	Preview, Comment	Jun 3, 2026 10:43pm

Resolve conflicts in helpers.ts and runEventPatterns.ts: - helpers.ts: keep both our resolveBodyExpression/SAFE_BODY_EXPR_CHARS exports and main's mergeWhereIntoSelectItems/clickHouseErrorResult - runEventPatterns.ts: import resolveBodyExpression, SAFE_BODY_EXPR_CHARS, and clickHouseErrorResult from helpers - search.ts: update trimToolResponse usage to new { data, isTrimmed } API Add changeset for the denoise feature (patch).

github-actions · 2026-05-29T17:30:54Z

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

Diff size: 518 production lines changed (Tier 2 max: < 250)
Cross-layer change: touches frontend (packages/app) + backend (packages/api) + shared utils (packages/common-utils)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats

Production files changed: 7
Production lines changed: 518 (+ 104 in test files, excluded from tier calculation)
Branch: brandon/denoising
Author: brandon-pereira

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-29T17:36:11Z

Deep Review

✅ No critical issues found. No P0/P1: there is no data loss, auth bypass, or injection introduced here (the SQL-interpolated body column is operator-configured, not end-user input), and the graceful-degradation contract holds on every path — a denoise failure is caught and the original search result is always preserved. The findings below are recommended improvements and nits.

🟡 P2 — recommended

packages/api/src/mcp/tools/query/search.ts:202 — filteredRowCount is set to denoised.rows.length before the second trimToolResponse(denoisedResult) runs, so when the filtered payload still exceeds the size budget the trim drops rows from result.data and the metadata count overstates the rows actually returned (reachable at maxResults up to 200 with wide/multiline bodies).
- Fix: Derive filteredRowCount from the trimmed array length after trimToolResponse, and apply the same correction on the denoise_failed branch.
- _{correctness, adversarial, api-contract}
packages/api/src/mcp/tools/query/denoise.ts:264 — the sampleConfig/countConfig builder blocks (~12 fields each) are duplicated nearly verbatim from runEventPatterns.ts, so any future ChartConfigWithDateRange field must be edited in two places to stay in sync.
- Fix: Extract a shared buildPatternSampleConfigs(source, bodyColumn, tsExpr, options, limit) helper and call it from both denoise.ts and runEventPatterns.ts.
- _{maintainability}
packages/api/src/mcp/tools/query/denoise.ts:383 — minePatterns already trains a TemplateMiner over the sample, then denoiseSearchResults constructs and re-trains a second miner over the identical sampleRows, doubling synchronous Drain CPU (up to ~20k addLogMessage calls) on the event loop per denoised request.
- Fix: Return the trained miner from minePatterns and reuse it for row matching instead of rebuilding it.
- _{performance, maintainability}
packages/api/src/mcp/tools/query/denoise.ts:279 — the sample query uses orderBy: rand() over the full time range, forcing a full scan-and-sort that can exhaust the 30s max_execution_time on large tables and silently skip denoising (skipped: 'sampling_failed').
- Fix: Use ClickHouse's native SAMPLE clause (or a bounded pre-filter before randomizing) so sampling does not scan and sort the entire range.
- _{performance, reliability}
packages/api/src/mcp/tools/query/search.ts:33 — denoise filters only the rows in the maxResults-limited page (default 50), not the full event corpus, but neither the denoise schema description nor the response says so, so an LLM consumer can misread a near-empty filtered page as "few interesting events exist."
- Fix: State in the denoise field description (and/or response) that filtering applies to the returned page, and emit a human-readable reason alongside the opaque skipped slug.
- _{agent-native, api-contract}
packages/api/src/mcp/__tests__/queryTool.test.ts:51 — the empty-results test comments that the denoised block should be absent but only asserts the presence of result, so it would pass even if the denoise path were broken.
- Fix: Add expect(output).not.toHaveProperty('denoised') to make the assertion meaningful.
- _testing
packages/api/src/mcp/tools/query/denoise.ts:210 — new graceful-degradation behavior is largely untested: the denoise_failed catch path, the skip reasons (source_not_found, no_body_column, connection_not_found, body_column_not_in_results, sampling_failed), and the all-rows-removed hint branch have no coverage.
- Fix: Add tests that force a thrown denoiseSearchResults, each skip reason, and the 100%-noisy-rows case, asserting rows are preserved and the expected metadata is emitted.
- _{testing, reliability}

🔵 P3 nitpicks (9)

packages/api/src/mcp/tools/query/search.ts:125 — parsed.result is already typed, but it is re-cast to Record<string, unknown> and then .data re-cast again, erasing the declared shape and violating the repo's documented "avoid as casts" guidance.
- Fix: Replace the double cast with const rows = parsed.result?.data; and rely on the existing Array.isArray guard.
- _{kieran-typescript, project-standards}
packages/api/src/mcp/tools/query/search.ts:132 — let denoised; is declared without a type annotation, so use-before-assignment safety relies entirely on the catch block always returning.
- Fix: Annotate as let denoised: DenoiseResult; importing the type from ./denoise.
packages/api/src/mcp/tools/query/denoise.ts:196 — the skipped reason values are untyped string literals duplicated across denoise.ts and search.ts.
- Fix: Define a DenoiseSkipReason string-union/const and type the skipped field with it.
- _{maintainability, api-contract}
packages/api/src/mcp/tools/query/search.ts:97 — when denoise=true but the search returns zero rows, the handler early-returns the raw result with no denoised block, so a caller that requested denoise must still guard for its absence.
- Fix: Emit a denoised block with skipped: 'no_rows' for consistency.
- _api-contract
packages/api/src/mcp/tools/query/denoise.ts:236 — denoiseSearchResults re-fetches the source and connection already resolved by the upstream runConfigTile, adding a redundant round-trip and a TOCTOU window between the search and sample queries.
- Fix: Pass the already-resolved source/client into the function rather than re-fetching by id.
packages/api/src/mcp/tools/query/denoise.ts:349 — new Date(String(tsRaw)).getTime() returns NaN (not null) for unparseable timestamps, bypassing the ?? startDate fallback; harmless in the denoise path today since the trend is unused, but fragile.
- Fix: Return Number.isFinite(ms) ? ms : null.
packages/api/src/mcp/tools/query/denoise.ts:268 — the source-resolved bodyColumn is interpolated into the SQL select without the SAFE_BODY_EXPR_CHARS check that the caller-supplied path in runEventPatterns.ts applies; the value is operator-configured (not exploitable by an end user) but the two mining paths now differ in posture.
- Fix: Apply SAFE_BODY_EXPR_CHARS to the resolved expression for defense-in-depth parity, or document why it is unnecessary here.
- _{security, adversarial}
packages/api/src/mcp/tools/query/denoise.ts:391 — findBodyColumnKey matches result-row keys only by exact/case-insensitive equality, so for bracket/map body expressions or searches whose columns omit the body, denoise silently no-ops via body_column_not_in_results.
- Fix: Alias the body column under a known key in the sample/result select and match on that alias.
- _correctness
packages/api/src/mcp/tools/query/denoise.ts:344 — body strings are fed to Drain via String(raw) with no length cap, so 10k multi-KB bodies can create significant transient heap pressure.
- Fix: Truncate body text to a bounded length before mining/matching.

Reviewers (12): correctness, security, adversarial, testing, maintainability, performance, api-contract, reliability, kieran-typescript, project-standards, agent-native, learnings-researcher.

Testing gaps:

Graceful-degradation paths for the new feature (thrown denoiseSearchResults → denoise_failed; each skipped reason) are uncovered.
The all-rows-removed hint branch and the metadata-vs-trimmed-data consistency under >50KB payloads are untested.
resolveBodyExpression is untested for SourceKind.Trace and the multi-expression split branch; findBodyColumnKey's case-insensitive fallback is untested.
No docs/solutions/ learnings exist for this area yet — worth capturing Drain/ClickHouse-sampling/MCP graceful-degradation patterns after this lands.

github-actions · 2026-05-29T17:39:09Z

E2E Test Results

✅ All tests passed • 194 passed • 3 skipped • 1211s

Status	Count
✅ Passed	194
❌ Failed	0
⚠️ Flaky	5
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

- Key noisy patterns by template string (p.pattern / match.getTemplate()) instead of per-instance cluster ID, eliminating fragile coupling to minePatterns() internal auto-increment ordering - Always emit a 'denoised' metadata block when denoise=true, including a 'skipped' field with the reason when denoising cannot proceed (source_not_found, no_body_column, body_column_not_in_results, connection_not_found, sampling_failed, no_sample_data, no_rows) - Rename originalRowCount to returnedRowCountBeforeDenoise to make the post-trim semantics explicit - Fix misleading maxSamples:0 comment (minePatterns always keeps at least one sample per cluster); use maxSamples:1 instead - Add integration tests for denoise=true: schema exposure, empty results handling, noisy pattern filtering with seeded data, denoised metadata shape assertions, and denoise=false control case

- Resolve conflict in search.ts: keep clickstack_ prefix + denoise description - Rename hyperdx_ to clickstack_ in denoise tests - Extract DENOISE_SAMPLE_SIZE and DENOISE_NOISE_THRESHOLD to common-utils/drain (shared between FE DBRowTable and BE MCP denoise)

Wrap the denoiseSearchResults call in a try/catch so that a failure in the denoise post-processing step (getSource, getConnectionById, pattern mining, etc.) does not discard an already-successful search result. On catch, return the original rows with a denoised metadata block containing skipped: 'denoise_failed', following the same graceful- degradation convention used by other failure paths in denoise.ts.

vercel Bot deployed to Preview – hyperdx-oss May 29, 2026 15:46 View deployment

github-actions Bot added the review/tier-3 Standard — full human review required label May 29, 2026

vercel Bot deployed to Preview – hyperdx-storybook May 29, 2026 17:32 View deployment

vercel Bot deployed to Preview – hyperdx-oss May 29, 2026 17:34 View deployment

vercel Bot deployed to Preview – hyperdx-storybook May 29, 2026 17:47 View deployment

vercel Bot deployed to Preview – hyperdx-oss May 29, 2026 17:48 View deployment

brandon-pereira requested review from a team and teeohhem and removed request for a team May 29, 2026 17:58

teeohhem reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/api/src/mcp/tools/query/denoise.ts Outdated

teeohhem previously approved these changes Jun 2, 2026

View reviewed changes

brandon-pereira dismissed teeohhem’s stale review via 343003d June 3, 2026 21:13

vercel Bot deployed to Preview – hyperdx-storybook June 3, 2026 21:15 View deployment

brandon-pereira added 2 commits June 3, 2026 15:17

chore: update changeset for clickstack_ rename and shared constants

acbb053

vercel Bot deployed to Preview – hyperdx-storybook June 3, 2026 22:42 View deployment

vercel Bot deployed to Preview – hyperdx-oss June 3, 2026 22:43 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): add denoise option to hyperdx_search tool#2371

feat(mcp): add denoise option to hyperdx_search tool#2371
brandon-pereira wants to merge 6 commits into
mainfrom
brandon/denoising

brandon-pereira commented May 29, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 29, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brandon-pereira commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

How it works

Changes

Example response (with denoise=true)

Performance

Uh oh!

changeset-bot Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Tier 3 — Standard

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review

🟡 P2 — recommended

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brandon-pereira commented May 29, 2026 •

edited

Loading

changeset-bot Bot commented May 29, 2026 •

edited

Loading

vercel Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading