feat(core): support field-level and range filters in where clause#1064
Conversation
The loader's `where` option previously only processed taxonomy-based keys (via JOIN). Non-taxonomy field names were silently discarded, forcing sites with large content libraries to load entire collections into memory and filter in JavaScript. This adds two capabilities: - Exact/multi-value match on content table columns (AND col = ? / IN) - Range comparisons (gt, gte, lt, lte) for date and string fields Both are executed at the SQL layer with parameterized queries and validated identifiers. Taxonomy filtering remains unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: ba6368c The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
|
/review |
There was a problem hiding this comment.
Nice feature — the SQL-layer field/range filtering is well-scoped, parameterized correctly, and consistent with how sql.ref() is used elsewhere in the loader. Tests cover the main matrices (exact, IN, gt/gte/lt/lte, combined with taxonomy, with pagination). The minor-bump changeset is appropriate.
A few non-blocking observations below. The biggest one is the silent footgun behavior when field names don't exist or don't match the validation regex — a typo today returns the wrong rows with no signal, and one of the tests even codifies that. Worth at least mentioning in the JSDoc, even if behavior stays as-is.
| const conditions: ReturnType<typeof sql>[] = []; | ||
|
|
||
| for (const [key, value] of Object.entries(fields)) { | ||
| if (!FIELD_NAME_PATTERN.test(key)) continue; |
There was a problem hiding this comment.
Invalid field names are silently dropped here, and the test on line 122 of loader-field-filters.test.ts even codifies this (returns all entries when every key fails the regex). This is a debugging footgun: a typo like series_ vs seriesX would return wrong rows with no signal — particularly bad given the motivating use case is filtering ~4k chapters where "too many results" looks like a working query.
The same FIELD_NAME_PATTERN is used for column references throughout the loader, but in those paths (orderBy, cursor) the field comes from internal code, not user input. Here the input is external.
Consider either:
- Throwing a clear error for invalid keys (preferred — fail fast)
- Logging a
console.warnso the issue surfaces in dev - At minimum, documenting in the JSDoc on
wherethat invalid identifiers are silently ignored
Also applies to filtering by valid identifiers that aren't real columns — those will hit a SQL error, but the message will reference an unknown column rather than the user's mistake.
| } | ||
| } else { | ||
| conditions.push(sql`${ref} = ${value}`); | ||
| } |
There was a problem hiding this comment.
Two runtime values that the WhereValue type doesn't admit but JS users can still pass:
null→ falls through to theelsebranch and emitscolumn = NULL, which is never true in SQL. The user probably meantIS NULL. Silent footgun.undefined→ same path, emitscolumn = ?bound toundefined. Driver behavior varies (better-sqlite3 throws; pg coerces to NULL).
A single guard like if (value == null) continue; at the top of the loop body would make both well-behaved. If you want explicit null filtering as a future feature, that can be a separate { isNull: true } operator.
| @@ -423,7 +478,7 @@ export interface CollectionFilter { | |||
| /** | |||
| * Filter by field values or taxonomy terms | |||
There was a problem hiding this comment.
The expanded JSDoc on query.ts's CollectionFilter.where mentions ranges and field filters with examples; this copy still says only "Filter by field values or taxonomy terms". Worth syncing so the IDE hover shows the new capability regardless of which CollectionFilter symbol someone hits.
| * Filter by field values or taxonomy terms | |
| /** | |
| * Filter by field values, taxonomy terms, or ranges. | |
| * | |
| * Taxonomy names are detected automatically and filtered via JOIN. | |
| * Other keys are treated as column filters on the content table. | |
| * Invalid identifiers (not matching `/^[a-zA-Z_][a-zA-Z0-9_]*$/`) are silently ignored. | |
| * | |
| * @example { category: 'news' } - taxonomy term | |
| * @example { series: 'main' } - exact match on a content field | |
| * @example { published_at: { gte: '2024-01-01', lt: '2025-01-01' } } - date range | |
| */ |
| const limit = filter?.limit; | ||
| const cursor = filter?.cursor; | ||
| const where = filter?.where; | ||
| const where = filter?.where as Record<string, WhereValue> | undefined; |
There was a problem hiding this comment.
The cast is redundant — filter?.where is already typed as Record<string, WhereValue> | undefined after the interface change two lines up the file.
| const where = filter?.where as Record<string, WhereValue> | undefined; | |
| const where = filter?.where; |
| for (const [key, value] of Object.entries(where)) { | ||
| if (taxNames.has(key)) { | ||
| taxonomyFilters[key] = value; | ||
| if (taxNames.has(key) && !isWhereRange(value)) { |
There was a problem hiding this comment.
Edge case worth a thought: a range value against a taxonomy key (e.g. where: { category: { gte: 'a' } }) hits the else branch and is routed to fieldFilters, which will then emit ec_<type>.category >= 'a'. There's no category column on the content table (categories live in the join), so this produces a confusing "no such column" SQL error rather than a helpful "ranges aren't supported on taxonomies" message.
The TS type does allow WhereRange for any key. Two reasonable options:
- Detect this case explicitly and throw a clear error.
- Treat range-on-taxonomy as a no-op and document it.
Low priority since taxonomy slugs aren't a common range-filter target, but it's a sharp edge in the API surface.
| ? stableStringify(v as Record<string, unknown>) | ||
| : v; | ||
| } | ||
| return JSON.stringify(ordered); |
There was a problem hiding this comment.
Minor: the recursion returns a string but then assigns that string back as a value in ordered, so the outer JSON.stringify(ordered) re-escapes the inner JSON. It works (still deterministic), but the resulting cache key has confusing escape characters like '{\"foo\":\"{\\\"gte\\\":\\\"x\\\"}\"}'.
A cleaner version returns the ordered object and only stringifies once at the top:
| return JSON.stringify(ordered); | |
| function stableStringify(value: Record<string, unknown>): string { | |
| return JSON.stringify(stableOrder(value)); | |
| } | |
| function stableOrder(value: Record<string, unknown>): Record<string, unknown> { | |
| const keys = Object.keys(value).toSorted(); | |
| const ordered: Record<string, unknown> = {}; | |
| for (const k of keys) { | |
| const v = value[k]; | |
| ordered[k] = | |
| v !== null && typeof v === "object" && !Array.isArray(v) | |
| ? stableOrder(v as Record<string, unknown>) | |
| : v; | |
| } | |
| return ordered; | |
| } |
No behavior change beyond cache key string length; just easier to debug if you ever log the key.
|
Review submitted successfully. Posted 6 line-anchored comments covering:
No blocking issues — the feature is sound, parameterization is correct, tests are thorough. |
- Warn on invalid field names instead of silently ignoring - Skip null/undefined values to prevent SQL `= NULL` footgun - Warn and skip range operators on taxonomy keys - Warn when a second taxonomy key is passed (only one supported) - Handle missing column errors gracefully (return empty results) - Add dialect-safe isMissingColumnError to db-errors.ts - Sync JSDoc for `where` to document ranges and field filters - Remove redundant type cast on `filter?.where` - Fix stableStringify double-escaping by separating order and serialize - Add tests for null values and taxonomy range edge cases
Use existing isRecord() type guard instead of `as Record<string, unknown>` to satisfy oxlint --type-aware --deny-warnings.
There was a problem hiding this comment.
This is a well-scoped, cleanly implemented feature that solves a real pain point (SQL-level field filtering in getEmDashCollection). The approach fits EmDash's architecture: it layers onto the existing loader without breaking taxonomy filtering, validates identifiers before they reach SQL, parameterizes all values, preserves locale/cursor semantics, and keeps backwards compatibility.
The implementation is solid. I checked SQL safety, cache-key stability, backwards compatibility, locale handling, and cursor pagination correctness — all look good. The tests added are comprehensive for the happy path.
Two suggestions, neither blocking:
- The
isMissingColumnErrorPostgreSQL heuristic can falsely match table errors when the table name happens to contain "column" (e.g.relation "column_stats" does not exist). The end-behavior is harmless (same silent empty-result as a missing-table error), but the warning log will mislabel it as a filter issue. - The missing-column fallback path is untested. A test that drops a column or mocks a query returning a missing-column error would shore up coverage for that graceful-degradation branch.
| if (!message) return false; | ||
|
|
||
| // SQLite / D1: "no such column: foo" | ||
| if (message.includes("no such column")) return true; |
There was a problem hiding this comment.
[suggestion] The PostgreSQL heuristic can produce false positives when a missing table or relation happens to contain the word "column" in its name (e.g. relation "column_stats" does not exist). isMissingTableError already catches that case, so the net behavior is still correct, but the warning logged in loader.ts will mislabel a missing-table error as a missing-column error.
| if (message.includes("no such column")) return true; | |
| if (message.includes("does not exist") && message.includes("column") && !message.includes("relation")) return true; |
PostgreSQL missing-column errors don't include "relation", so excluding it eliminates the false positive without losing real matches.
| @@ -693,13 +761,17 @@ export function emdashLoader(): LiveLoader<EntryData, EntryFilter, CollectionFil | |||
| }, | |||
There was a problem hiding this comment.
[suggestion] The isMissingColumnError fallback path (returning empty results when a where filter references a non-existent column) is not covered by the new test file. Since this is a new graceful-degradation code path, a targeted test that triggers a missing-column error (e.g. by querying with a typo like serie or by temporarily dropping the column mid-test) would prevent regressions.
- Exclude "relation" keyword in PostgreSQL check to prevent false positives on table names containing "column" - Add test for missing column graceful fallback in where filter
There was a problem hiding this comment.
This is a clean follow-up to my previous review. Both prior suggestions have been addressed:
-
isMissingColumnErrorfalse positives — The PostgreSQL heuristic now excludes messages containing "relation", preventing table names likecolumn_statsfrom being misidentified as missing-column errors. -
Missing-column test coverage — A new test verifies that a
wherefilter referencing a non-existent column gracefully returns an empty result set with a warning log, rather than propagating the SQL error.
I also verified the additional follow-up commits:
stableStringifyhas been split intostableOrder+JSON.stringify, fixing a cache-key stability bug where nestedWhereRangeobjects with keys in different orders would produce different cache keys.- The unsafe type assertion in
stableOrderwas replaced with the existingisRecord()type guard.
The implementation remains sound: SQL identifiers are validated before reaching the query builder, all values are parameterized, taxonomy and field filters compose correctly, locale filtering and cursor pagination are preserved, and backwards compatibility is maintained. No new issues found.
What does this PR do?
The loader's
whereoption previously only processed taxonomy-based keys (via JOIN). Non-taxonomy field names were silently discarded, forcing sites with large content libraries to load entire collections into memory and filter in JavaScript.This adds two capabilities to
getEmDashCollection'swhereclause:{ series: "main" }or{ series: ["main", "side"] }{ published_at: { gte: "2024-01-01T00:00:00Z", lt: "2025-01-01T00:00:00Z" } }Both execute at the SQL layer with parameterized queries and validated identifiers. Taxonomy filtering remains unchanged (backward compatible).
Motivation: A fiction serialization site with ~4000 chapters spanning 10+ years. The timeline page needs to filter by year (
published_atrange) and by series (field exact match). Without SQL-level field filtering, the only options are wasteful full-table JS filtering or dropping to rawgetDb()and losing preview, visual editing, byline hydration, and request caching.Type of change
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (or targeted tests for my change)pnpm formathas been runAI-generated code disclosure
Screenshots / test output
Usage examples