Skip to content

fix: unknown lucene field falls through in search#2422

Draft
karl-power wants to merge 1 commit into
mainfrom
lucene-unknown-field-falls-through-as-a-raw-sql-identifier
Draft

fix: unknown lucene field falls through in search#2422
karl-power wants to merge 1 commit into
mainfrom
lucene-unknown-field-falls-through-as-a-raw-sql-identifier

Conversation

@karl-power
Copy link
Copy Markdown
Contributor

Summary

Why

When a Lucene search field couldn't be resolved to a real column, the CustomSchemaSQLSerializerV2 fall-through emitted it verbatim as a raw SQL identifier (queryParser.ts, the old // It might be an alias, let's just try the column branch). That had two problems:

  • Typos / non-existent fields produced SQL like WHERE myTypo = '...', which ClickHouse rejects with a confusing Unknown identifier error instead of simply returning no rows.
  • The fall-through was the only thing making legitimate SELECT-alias references work (ClickHouse resolves SELECT aliases in WHERE), but it couldn't tell an alias apart from a typo — both were emitted blindly.

This change makes that distinction explicit: a field that matches a known SELECT alias is emitted as a bare identifier; anything genuinely unknown resolves to the no-match predicate (1 = 0).

What changed

  • Gate the fall-through on known aliases (queryParser.ts): CustomSchemaConfig / CustomSchemaSQLSerializerV2 gain an optional selectAliases: Set<string>. The resolver now returns the bare identifier only when selectAliases.has(field); otherwise it returns found: false, which renders as (1 = 0).
  • Collect aliases from the chart config (renderChartConfig.ts): new extractSelectAliases(...) helper gathers aliases from both select and groupBy, for both list shapes — array form ({ valueExpression, alias }[]) reads alias directly, and string form (raw SQL, e.g. a source's defaultTableSelectExpression such as 'Timestamp, ServiceName as service, Body') is parsed via the existing chSqlToAliasMap() helper to recover declared aliases. This matters because the default search/events view uses string-form selects, so without it the default view would lose alias resolution. The set is threaded through renderWhereExpressionStr into the serializer.
  • Null-safety fix (queryParser.ts): the exact-match branch now treats a null/undefined materialized-columns lookup the same as the existing catch path (?? new Map()), so a missing lookup proceeds with no materialized columns instead of throwing on .entries().

Behavior change

Lucene field Before After
Real column (ServiceName:foo) resolves resolves (unchanged)
SELECT alias (Content:foo where Body AS Content) raw identifier (worked by luck) resolves explicitly
Unknown / typo (myTypo:foo) raw identifier → ClickHouse error (1 = 0) → no rows

Example that now works end-to-end: SELECT Body AS Content … WHERE Content = '…'

How to test on Vercel preview

Preview routes: /search

Steps:

  1. Go to the search page.
  2. Try some of the search examples from the Behavior change section above.

References

  • Linear Issue: Closes HDX-4367

@karl-power karl-power changed the title fix: unknown lucene field falls through fix: unknown lucene field falls through in search Jun 5, 2026
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 5, 2026

🦋 Changeset detected

Latest commit: a6ed715

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@hyperdx/common-utils Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment Jun 5, 2026 11:51am
hyperdx-storybook Ready Ready Preview, Comment Jun 5, 2026 11:51am

Request Review

@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label Jun 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 2
  • Production lines changed: 65 (+ 166 in test files, excluded from tier calculation)
  • Branch: lucene-unknown-field-falls-through-as-a-raw-sql-identifier
  • Author: karl-power

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@karl-power karl-power force-pushed the lucene-unknown-field-falls-through-as-a-raw-sql-identifier branch from 09e5e96 to a6ed715 Compare June 5, 2026 11:44
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

E2E Test Results

All tests passed • 197 passed • 3 skipped • 1261s

Status Count
✅ Passed 197
❌ Failed 0
⚠️ Flaky 4
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Deep Review

Gating the unknown-field fall-through is the right call, and the implementation is sound: every getColumnForField consumer short-circuits on !found to the (1 = 0) predicate, the alias path is strictly narrower than the prior unconditional verbatim emission (no new injection surface), and the ?? new Map() addition fixes a real latent .entries() crash that the null-returning test mock now exercises. No ship-blockers. The findings below are a coverage gap in alias collection plus test/clarity recommendations.

✅ No critical issues found.

🟡 P2 -- recommended

  • packages/common-utils/src/core/renderChartConfig.ts:1112 -- extractSelectAliases collects aliases only from select and groupBy, so a Lucene WHERE referencing a WITH expr AS name expression-alias (isSubquery: false) now resolves to (1 = 0) and silently returns zero rows, where the prior fall-through emitted the bare identifier that ClickHouse resolved.
    • Fix: Also feed expression-alias with entries (isSubquery === false) into the alias set, excluding subquery CTEs which are table names rather than field identifiers.
    • correctness
  • packages/common-utils/src/core/renderChartConfig.ts:1112 -- the groupBy argument to extractSelectAliases is collected but no test exercises a Lucene WHERE that references a groupBy-declared alias, so that branch is unverified.
    • Fix: Add a renderChartConfig test with a groupBy alias referenced from a Lucene WHERE, asserting it resolves to a bare identifier.
    • testing
  • packages/common-utils/src/queryParser.ts:1602 -- a negated unknown field (-NotAColumn:foo) returns (1 = 0) before isNegatedField is applied, so an excluded unknown field matches nothing instead of everything, and no test pins this newly-reachable behavior.
    • Fix: Add a test for -NotAColumn:foo pinning the emitted predicate, and confirm whether an excluded unknown field should match all rows or none.
    • testing, correctness
🔵 P3 nitpicks (4)
  • packages/common-utils/src/queryParser.ts:1602 -- the found: false branch still returns columnExpression: field and columnType: 'Unknown', which no caller reads, making the not-found result look like a usable column.
    • Fix: Convert the return type to a discriminated union ({ found: false } | { found: true; columnExpression; columnType; ... }) so the not-found path carries no phantom fields.
    • kieran-typescript, maintainability
  • packages/common-utils/src/core/renderChartConfig.ts:996 -- the comment claims the scaffold mirrors a SELECT * FROM `t` form, but the code emits SELECT ${...} FROM t (no backticks, no *) and the referenced scaffold differs.
    • Fix: Align the scaffold to the backtick form used elsewhere or drop the inaccurate cross-reference.
    • maintainability
  • packages/common-utils/src/core/renderChartConfig.ts:993 -- the comment states chSqlToAliasMap swallows parse failures and returns {}, but it logs via console.error and returns whatever aliases it parsed before the throw.
    • Fix: Reword the comment to reflect that it logs and returns the partial alias map.
    • kieran-typescript, maintainability
  • packages/common-utils/src/__tests__/renderChartConfig.test.ts:2117 -- the new alias-resolution tests assert with toContain rather than exact toBe, so a regression that leaks extra raw SQL around the alias could still pass.
    • Fix: Compare the full parameterizedQueryToSql output with toBe, matching the strong assertions used in queryParser.test.ts.
    • testing

Reviewers (6): correctness, testing, maintainability, kieran-typescript, security, performance.

Testing gaps:

  • No test exercises getMaterializedColumnsLookupTable returning null to confirm the ?? new Map() guard (the new alias tests take the fall-through path, not exactMatch).
  • No test merges aliases across select + groupBy simultaneously, nor recovers a bracket/array-index alias (e.g. ResourceAttributes['x'] as y) from a string-form select.

@karl-power karl-power marked this pull request as draft June 5, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant