Skip to content

feat: use text index to power filters and autocomplete#2376

Open
knudtty wants to merge 11 commits into
mainfrom
aaron/fts-index-as-aggregate-view
Open

feat: use text index to power filters and autocomplete#2376
knudtty wants to merge 11 commits into
mainfrom
aaron/fts-index-as-aggregate-view

Conversation

@knudtty
Copy link
Copy Markdown
Contributor

@knudtty knudtty commented May 29, 2026

Summary

The new text indexes can power filters and autocomplete and ease the metadataMVs. So let's do it!

References

  • Linear Issue: Closes HDX-4229

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 29, 2026

⚠️ No Changeset found

Latest commit: c5a96e2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 1, 2026 9:10pm
hyperdx-storybook Ready Ready Preview, Comment Jun 1, 2026 9:10pm

Request Review

@github-actions github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

  • Large diff: 1174 production lines changed (threshold: 1000)

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats
  • Production files changed: 4
  • Production lines changed: 1174 (+ 981 in test files, excluded from tier calculation)
  • Branch: aaron/fts-index-as-aggregate-view
  • Author: knudtty

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

E2E Test Results

All tests passed • 191 passed • 3 skipped • 1310s

Status Count
✅ Passed 191
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

Deep Review

🔴 P0/P1 — must fix

  • packages/common-utils/src/core/metadata.ts:498getMapKeys text-index queries have no try/catch, so a transient error from mergeTreeTextIndex propagates out instead of falling through to the MV/scan tiers as the sibling getMapValues cascade does.
    • Fix: Wrap the keysIndex block (lines ~498-518) and the itemsIndex block (lines ~519-539) in try/catch with console.warn, mirroring the pattern in getMapValues at lines 819-865.
    • correctness, reliability
  • packages/common-utils/src/core/metadata.ts:955getMapValues main-table-scan fallback uses Promise.all(keys.map(...)); one rejected per-key query discards every successful sibling result, concentrating blast radius vs. the prior per-key API.
    • Fix: Replace Promise.all with Promise.allSettled (or wrap each per-key body in try/catch returning []) and aggregate the successful entries.
    • adversarial, performance, reliability

🟡 P2 — recommended

  • packages/common-utils/src/core/metadata.ts:786getMapValues no longer wraps the query path in cache.getOrFetch; every call re-runs the full cascade and the fetchPerKeyFallback paths at lines 1670/1815 issue uncached ClickHouse queries on repeat.
    • Fix: Restore a result-level cache.getOrFetch keyed on (connectionId, db, table, column, sorted-keys, dateRange, timestampValueExpression), or cache inside the fallback paths.
    • correctness, kieran-typescript, performance, reliability
  • packages/common-utils/src/core/metadata.ts:847 — Text-index queries (this site and getMapKeys at lines ~505 and ~526) pass this.getClickHouseSettings() with no max_execution_time/timeout_overflow_mode, while the MV branch bounds execution at 15s; a slow mergeTreeTextIndex scan stalls to the server default before cascading.
    • Fix: Apply {...this.getClickHouseSettings(), timeout_overflow_mode: 'break', max_execution_time: 15, max_rows_to_read: '0'} consistent with the MV branch.
    • performance, reliability
  • packages/common-utils/src/core/metadata.ts:422partsOverlapFilter returns chSql\1`whentimestampValueExpressionis absent; theuseMetadata.tsx -> getAllFieldsAndValues -> getMapValueschain never threadstimestampValueExpression, so text-index queries scan ALL parts even when the caller passed a narrow dateRange`.
    • Fix: Thread timestampValueExpression through getAllFieldsAndValues -> getMapValues, or have partsOverlapFilter resolve it from source metadata when not supplied.
    • adversarial, correctness
  • packages/common-utils/src/core/metadata.ts:835 — Text-index split uses position(token, sep), which returns the FIRST occurrence; a map whose KEY contains the separator literal (e.g., key a=b with separator =) parses as key a / value b=c while the OR-chain startsWith still matches the row, returning corrupted pairs only on the text-index path.
    • Fix: Compute the split offset from each prefixed key's known length (the OR-chain already constrains the prefix), or validate at index-discovery time that map keys cannot contain the separator.
    • adversarial, correctness
  • packages/common-utils/src/core/metadata.ts:515getMapKeys text-index and rollup branches use cache.get + manual cache.set instead of cache.getOrFetch, losing the pendingQueries dedup that previously joined concurrent identical fetches; two simultaneous autocomplete callers now fire identical mergeTreeTextIndex queries.
    • Fix: Wrap each branch's query in await this.cache.getOrFetch(cacheKey, async () => {...}).
    • performance
  • packages/common-utils/src/core/metadata.ts:863 — New catch blocks at lines 578, 863, 922 blanket-swallow ALL errors as console.warn, including non-recoverable failures (auth, syntax, abort); the user-facing error eventually surfaces from the final tier and the root cause is invisible, doubling time-to-error.
    • Fix: Re-throw on AbortError and on non-transient ClickHouse error codes (ACCESS_DENIED, SYNTAX_ERROR), or attach an aggregated tier-failure diagnostic to the final thrown error.
    • reliability
🔵 P3 nitpicks (19)
  • packages/common-utils/src/queryParser.ts:19 — Re-export statements interleaved between two import blocks trip the project's simple-import-sort/imports: 'error' lint rule and the pre-commit hook will reformat.
    • Fix: Move export type { KvItemsInfo, KvItemsLookup } and export { parseKvItems... } below all imports.
    • maintainability, project-standards
  • packages/common-utils/src/core/metadata.ts:440getMapKeys does not accept or thread AbortSignal, while getMapValues was updated to accept one; aborts from autocomplete UI leave getMapKeys queries running server-side.
    • Fix: Add signal?: AbortSignal to getMapKeys and thread it into all three clickhouseClient.query calls.
    • reliability
  • packages/common-utils/src/core/metadata.ts:1720 — The new fetchPerKeyFallback repeats the spread-after-defaults clickhouse_settings pattern, so user settings silently override the in-function safety caps (same shape as the pre-existing site at line 951).
    • Fix: Put ...this.getClickHouseSettings() first, then the explicit max_rows_to_read and read_overflow_mode overrides last.
    • correctness
  • packages/api/src/tasks/checkAlerts/__tests__/renderAlertTemplate.test.ts:36 — Mock getMapValues: jest.fn().mockResolvedValue([]) returns a bare array, but the new production signature returns Map<string, string[]>; latent landmine for any future test that exercises a path consuming .get().
    • Fix: Change both renderAlertTemplate.test.ts:36 and checkAlerts.test.ts:1109 to mockResolvedValue(new Map()).
    • adversarial, api-contract, correctness, kieran-typescript, testing
  • packages/common-utils/src/core/kvItems.ts:241 — Two parallel dispatch shapes (KV_ITEMS_STRATEGIES array iterated inline in queryParser.ts, plus the parseKvItemsDefaultExpression dispatcher used by metadata.ts) duplicate the same strategy loop.
    • Fix: Drop the inline loop in queryParser.ts and call parseKvItemsDefaultExpression, or stop exporting one of the two shapes.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:375 — Local norm = (s) => s.replace(/\s+/g, '').replace(/\/g, '')is byte-identical tonormalizeChExpressioninqueryParser.ts`; both expressions must stay in sync forever.
    • Fix: Hoist a single normalizeClickHouseExpression into core/utils.ts and import from both modules.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:460getMapKeys cache key dropped the alignedDateRange MV branch; two callers in the same MV bucket no longer share cache entries.
    • Fix: Restore the aligned-bucket variant for the MV path or document the deliberate change.
    • correctness, maintainability
  • packages/common-utils/src/core/metadata.ts:582 — When getMapKeys text-index returns empty and the MV path succeeds, the MV value is cached under a shared key; a later call after the text index catches up still returns the cached MV result.
    • Fix: Distinguish cache entries by source path, or shorten TTL for fallback-tier results.
  • packages/common-utils/src/core/metadata.ts:1895 — Text-index cache key in getAllFieldsAndValues uses raw dateRange[0]/[1] while the MV cache key uses alignedDateRange; equivalent callers may double-cache identical results.
    • Fix: Use alignedDateRange for the text-index cache key, or document the divergence.
    • correctness, maintainability
  • packages/common-utils/src/core/metadata.ts:951settings spread ...this.getClickHouseSettings() after explicit max_rows_to_read/read_overflow_mode overrides the safety caps (pre-existing pattern, restated for parity with the new sibling at 1720).
    • Fix: Put the spread first, then the explicit overrides last.
    • correctness, kieran-typescript
  • packages/common-utils/src/core/metadata.ts:397getMapColumnTextIndexes silently drops a skip index when an alias matches by expression but its default_expression isn't a recognized KV-items shape.
    • Fix: Log a debug warning, or record a keys-only entry as a fallback.
    • correctness
  • packages/common-utils/src/core/metadata.ts:1947excludedIds.sort() mutates the array in place inside a cache-key template literal with no explanatory comment.
    • Fix: Use [...excludedIds].sort() and add // sort for deterministic cache key.
    • maintainability
  • packages/common-utils/src/core/metadata.ts:355getMapColumnTextIndexes is exposed publicly (no private modifier); if only intended for internal use plus tests, narrow the surface.
    • Fix: Mark private; tests can still spy via jest.spyOn(Metadata.prototype as any, 'getMapColumnTextIndexes').
    • api-contract
  • packages/common-utils/src/core/metadata.ts:134MapColumnTextIndexes allows the {} empty state because both fields are optional.
    • Fix: Tighten to a discriminated union requiring at least one of keysIndex or itemsIndex.
    • kieran-typescript, maintainability
  • packages/common-utils/src/core/kvItems.ts:71 — Tokenizer rejects backtick-quoted ClickHouse identifiers (e.g. `LogAttributes`), silently disabling text-index detection for those schemas.
    • Fix: Add a backtick-identifier branch to tokenizeExpression that reads `...` as a single identifier token and strips the backticks.
    • adversarial
  • packages/common-utils/src/core/metadata.ts:499 — Text-index keys-only path has no ORDER BY; the MV path ranks by sum(count) DESC, so autocomplete shows arbitrary (not popular) keys after this PR.
    • Fix: Accept the documented divergence or prefer the MV path when both are available for ranking quality.
    • adversarial, performance
  • packages/common-utils/src/__tests__/metadata.test.ts:1580 — Tests named "no cache poisoning" only verify a single invocation; a second-call probe is needed to prove the cache wasn't actually written.
    • Fix: After the first await, issue an identical call and assert that the underlying ClickHouse query fires again.
    • testing
  • packages/common-utils/src/__tests__/metadata.test.ts:2348 — Items-index .each table only covers default_type === 'ALIAS'; the MATERIALIZED predicate branch is uncovered.
    • Fix: Add a MATERIALIZED row to the parameterized table.
    • testing
  • packages/common-utils/src/__tests__/metadata.test.ts:2265(prototype.method as jest.Mock).mockRestore() + finally re-spy pattern leaks if finally doesn't run; couples test ordering to a global beforeAll spy shape.
    • Fix: Extract a withRealGetMapColumnTextIndexes(md, fn) helper, or scope the un-stubbing via beforeEach/afterEach inside the describe block.
    • maintainability, testing

Reviewers (7): adversarial, api-contract, correctness, kieran-typescript, maintainability, performance, project-standards, reliability, testing

Testing gaps:

  • No test covers a text-index tier failure in getMapKeys falling through to the MV/scan tiers (the cascade contract is not enforced by tests today).
  • No test exercises getMapValues Promise.all partial-failure semantics — one per-key query failing alongside successes.
  • No test covers getMapValues text-index with a multi-character separator (position(...) + separator.length arithmetic is untested for non-1 lengths).
  • No test for getMapValues({ keys: [] }) early-return or for getMapColumnTextIndexes engine === 'Distributed' early-return.
  • No test asserts AbortSignal propagation through getMapKeys (also blocked on threading the parameter through).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-4 Critical — deep review + domain expert sign-off

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant