feat(mcp): add denoise option to hyperdx_search tool#2371
feat(mcp): add denoise option to hyperdx_search tool#2371brandon-pereira wants to merge 6 commits into
Conversation
Add a `denoise` boolean parameter to the MCP `hyperdx_search` tool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" feature. When `denoise=true`: - Samples 10k random events from the same source/time range - Mines patterns using the shared Drain algorithm (common-utils) - Identifies "noisy" patterns (>10% of sampled events) - Matches each search result row against learned patterns - Filters out rows matching noisy patterns - Returns filtered rows plus metadata listing removed patterns Also extracts `resolveBodyExpression()` and `SAFE_BODY_EXPR_CHARS` from runEventPatterns.ts into helpers.ts for reuse.
🦋 Changeset detectedLatest commit: 10b0748 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Resolve conflicts in helpers.ts and runEventPatterns.ts:
- helpers.ts: keep both our resolveBodyExpression/SAFE_BODY_EXPR_CHARS
exports and main's mergeWhereIntoSelectItems/clickHouseErrorResult
- runEventPatterns.ts: import resolveBodyExpression, SAFE_BODY_EXPR_CHARS,
and clickHouseErrorResult from helpers
- search.ts: update trimToolResponse usage to new { data, isTrimmed } API
Add changeset for the denoise feature (patch).
🟡 Tier 3 — StandardIntroduces new logic, modifies core functionality, or touches areas with non-trivial risk. Why this tier:
Review process: Full human review — logic, architecture, edge cases. Stats
|
Deep Review✅ No critical issues found. No P0/P1: there is no data loss, auth bypass, or injection introduced here (the SQL-interpolated body column is operator-configured, not end-user input), and the graceful-degradation contract holds on every path — a denoise failure is caught and the original search result is always preserved. The findings below are recommended improvements and nits. 🟡 P2 — recommended
🔵 P3 nitpicks (9)
Reviewers (12): correctness, security, adversarial, testing, maintainability, performance, api-contract, reliability, kieran-typescript, project-standards, agent-native, learnings-researcher. Testing gaps:
|
E2E Test Results✅ All tests passed • 194 passed • 3 skipped • 1211s
Tests ran across 4 shards in parallel. |
- Key noisy patterns by template string (p.pattern / match.getTemplate()) instead of per-instance cluster ID, eliminating fragile coupling to minePatterns() internal auto-increment ordering - Always emit a 'denoised' metadata block when denoise=true, including a 'skipped' field with the reason when denoising cannot proceed (source_not_found, no_body_column, body_column_not_in_results, connection_not_found, sampling_failed, no_sample_data, no_rows) - Rename originalRowCount to returnedRowCountBeforeDenoise to make the post-trim semantics explicit - Fix misleading maxSamples:0 comment (minePatterns always keeps at least one sample per cluster); use maxSamples:1 instead - Add integration tests for denoise=true: schema exposure, empty results handling, noisy pattern filtering with seeded data, denoised metadata shape assertions, and denoise=false control case
- Resolve conflict in search.ts: keep clickstack_ prefix + denoise description - Rename hyperdx_ to clickstack_ in denoise tests - Extract DENOISE_SAMPLE_SIZE and DENOISE_NOISE_THRESHOLD to common-utils/drain (shared between FE DBRowTable and BE MCP denoise)
Wrap the denoiseSearchResults call in a try/catch so that a failure in the denoise post-processing step (getSource, getConnectionById, pattern mining, etc.) does not discard an already-successful search result. On catch, return the original rows with a denoised metadata block containing skipped: 'denoise_failed', following the same graceful- degradation convention used by other failure paths in denoise.ts.
What
Adds a
denoiseboolean parameter to the MCPhyperdx_searchtool that automatically filters out high-frequency repetitive event patterns from search results, mirroring the web app's "Denoise Results" checkbox.Why
When investigating issues via MCP, LLM agents get back raw search results that are often dominated by repetitive log noise. This forces agents to either sift through noisy results or make a separate
hyperdx_event_patternscall and then manually filter. Thedenoiseoption makes this a single-call workflow.How it works
When
denoise=true:common-utils)TemplateMiner, matches each search result row against learned patternsdenoisedmetadata block listing removed patterns, original row count, and filtered row countGraceful degradation: if source/connection/body column can't be resolved, or if pattern sampling fails, the original results are returned unmodified.
Changes
packages/api/src/mcp/tools/query/denoise.tsdenoiseSearchResults()functionpackages/api/src/mcp/tools/query/search.tsdenoiseschema param + post-processing logicpackages/api/src/mcp/tools/query/helpers.tsresolveBodyExpression()+SAFE_BODY_EXPR_CHARSpackages/api/src/mcp/tools/query/runEventPatterns.tsExample response (with denoise=true)
{ "result": { "data": [...filtered rows...] }, "denoised": { "removedPatterns": [ { "pattern": "GET /health <*> <*>", "estimatedCount": 45000, "sampleCount": 4500 } ], "originalRowCount": 50, "filteredRowCount": 12 } }Performance
Adds ~1-2s latency when
denoise=truedue to the pattern sampling queries. No impact whendenoise=false(the default).Closes HDX-4346