fix: stabilize metadata filters on postgres #536

phernandez · 2026-02-01T03:28:02Z

Summary

fix Postgres metadata tag filters by casting bind params to jsonb instead of leaving named placeholders in SQL
simplify numeric metadata comparisons to cast JSON values to double precision; keep numeric params as floats
adjust metadata filter parsing to preserve numeric types for comparison operators

Testing

uv run pytest tests/repository/test_search_repository.py -k "metadata_filters" -q
BASIC_MEMORY_TEST_POSTGRES=1 uv run pytest tests/repository/test_search_repository.py -k "metadata_filters" -q

Signed-off-by: phernandez <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dbcc52e3e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-01T03:29:02Z

src/basic_memory/repository/postgres_search_repository.py

                if filt.op in {"gt", "gte", "lt", "lte", "between"}:
-                    if filt.comparison == "numeric":
-                        numeric_expr = (
-                            f"CASE WHEN ({text_expr}) ~ '^-?\\\\d+(\\\\.\\\\d+)?$' "
-                            f"THEN ({text_expr})::double precision END"
-                        )
-                        compare_expr = numeric_expr
-                    else:
-                        compare_expr = text_expr
+                    compare_expr = (
+                        f"({metadata_expr} #>> '{path}')::double precision"
+                        if filt.comparison == "numeric"


Restore numeric-guard to avoid cast failures

Casting #>> directly to double precision will raise invalid input syntax for type double precision if any row has non-numeric text at that metadata path (e.g., "N/A"), causing the entire search query to error out even though the filter value is numeric. The previous CASE+regex avoided this by returning NULL for non-numeric values; with the new unconditional cast, numeric comparisons now fail for mixed-type metadata.

Useful? React with 👍 / 👎.

github-actions · 2026-02-01T03:30:07Z

Claude finished @phernandez's task in 2m 19s —— View job

PR Review: Fix metadata filters for Postgres

I've reviewed this PR against the Basic Memory team checklist. Here's my analysis:

Summary of Changes

This PR fixes Postgres metadata filter handling by:

metadata_filters.py (lines 86-108): Preserving numeric types (as float) for comparison operators instead of converting them to strings via _normalize_scalar()
postgres_search_repository.py (lines 322-359):
- Replacing ::jsonb cast syntax with CAST(... AS jsonb) for better param binding
- Simplifying numeric comparison expressions by removing regex-based guards

Critical Issue: P1 Concern is Valid ⚠️

The codex review raises a legitimate concern about line 344. The previous implementation had safety guards:

# Previous (safe):
numeric_expr = (
    f"CASE WHEN ({text_expr}) ~ '^-?\\\\d+(\\\\.\\\\d+)?$' "
    f"THEN ({text_expr})::double precision END"
)

# Current (unsafe):
compare_expr = f"({metadata_expr} #>> '{path}')::double precision"

Problem: If any entity has non-numeric text (e.g., "N/A", "unknown", "pending") in a metadata field that another entity uses numerically, the entire query will fail with:

ERROR: invalid input syntax for type double precision: "N/A"

The previous CASE WHEN ... ~ regex ... THEN cast pattern safely returned NULL for non-numeric values, allowing the query to succeed and simply exclude those rows from numeric comparisons.

Why this matters: Metadata fields can legitimately have mixed types across entities. For example:

Entity A: {"priority": 1}
Entity B: {"priority": "high"}
Entity C: {"priority": "N/A"}

A search for {"priority": {"$gt": 0}} would crash instead of just matching Entity A.

Team Checklist Review

✅ Code Quality & Standards

Python 3.12+ type annotations: Proper typing throughout
Async patterns: Follows SQLAlchemy 2.0 async conventions
Line length: All lines under 100 characters
Formatting: Code follows ruff formatting standards
Repository pattern: Changes properly isolated to repository layer

⚠️ Testing & Documentation

Tests exist: test_search_metadata_filters_numeric_comparisons covers basic numeric filtering
Edge case coverage: Missing test for mixed-type metadata fields (numeric + text in same field across entities)
100% coverage: Changes are covered, but edge case may not be
Documentation: No updates needed for internal repository changes

✅ Architecture

Local-first: No impact on architecture
Composition: Repository pattern maintained
SQLAlchemy patterns: Proper use of text() queries with params

⚠️ Security & Performance

Input validation: Handled by parse_metadata_filters
Error handling: Numeric cast can fail with invalid input syntax error (regression from previous safe implementation)
Performance: CAST vs :: notation has no performance difference

✅ Compatibility

Cross-platform: Postgres-specific code properly isolated
No unicode/emojis: Clean code

Recommendations

Restore the numeric guard (CRITICAL):

# For postgres_search_repository.py line 343-347
if filt.comparison == "numeric":
    # Safe: Only cast if value matches numeric regex pattern
    compare_expr = (
        f"CASE WHEN ({metadata_expr} #>> '{path}') ~ '^-?\\d+(\\.\\d+)?$' "
        f"THEN ({metadata_expr} #>> '{path}')::double precision END"
    )
else:
    compare_expr = text_expr

Add edge case test:

async def test_search_metadata_filters_numeric_with_mixed_types(search_repository, session_maker):
    """Test numeric filters when some entities have non-numeric values."""
    entity_numeric = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Number", {"score": 0.8}
    )
    entity_text = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Text", {"score": "N/A"}
    )
    
    # Should match only numeric entity, not crash
    results = await search_repository.search(
        metadata_filters={"score": {"$gt": 0.5}}
    )
    assert {result.id for result in results} == {entity_numeric.id}

Simplification is good, but safety matters: The CAST syntax change (line 335) is fine, but removing the regex guard traded robustness for simplicity.

Verdict

Request Changes: The P1 concern about unsafe numeric casting is valid and represents a regression. While the metadata filter parsing improvements are good, the removal of the safety guard in postgres_search_repository.py line 343-347 needs to be reverted.

The previous implementation's CASE WHEN ... ~ regex pattern was not just defensive programming—it was necessary to handle real-world scenarios where metadata fields contain mixed types across entities.

Fix this issue →

Signed-off-by: phernandez <[email protected]>

github-actions · 2026-02-01T03:57:01Z

Claude finished @phernandez's task in 2m 30s —— View job

PR Review: Metadata Filter Fixes for Postgres

I've reviewed this PR against the Basic Memory team checklist and analyzed the P1 concern raised by both automated reviewers.

Critical Issue: P1 Concern is Valid ⚠️

The automated review correctly identifies a regression in error handling. Here's the problem:

What changed:

Previous (commit 8838571): Used CASE WHEN ... ~ regex ... THEN cast guard for safe numeric casting
Current (this PR): Direct cast without guard: ({metadata_expr} #>> '{path}')::double precision

Why this is a problem:

The filter parsing logic in metadata_filters.py:86-96 only checks if the filter value is numeric, not if the metadata values in the database are numeric. This means:

User searches with {"priority": {"$gt": 0}} - filter value is numeric ✓
Parser sets comparison = "numeric" based on filter value ✓
Postgres query tries to cast ALL metadata values for "priority" field:
- Entity A: {"priority": 1} → casts to 1.0 ✓
- Entity B: {"priority": "high"} → ERROR: invalid input syntax for type double precision: "high" ❌

The entire query fails instead of gracefully excluding Entity B.

Real-world scenario: Metadata fields can legitimately have mixed types across entities:

Early entities: {"confidence": "high"} (before numeric scoring)
Later entities: {"confidence": 0.85} (after implementing numeric scores)

A search for {"confidence": {"$gt": 0.7}} would crash instead of matching only numeric values.

Team Checklist Review

✅ Code Quality & Standards

Python 3.12+ type annotations - proper typing throughout
Async patterns - follows SQLAlchemy 2.0 conventions
Line length - all under 100 characters
Repository pattern - changes properly isolated
CAST vs :: syntax - functionally equivalent

⚠️ Testing & Documentation

Tests exist - test_search_metadata_filters_numeric_comparisons covers basic numeric filtering
Missing edge case test - no test for mixed-type metadata (numeric + text in same field)
100% coverage may be misleading - edge case not exercised
Documentation - no updates needed for internal changes

⚠️ Security & Error Handling

Input validation - handled by parse_metadata_filters
Error handling regression - unsafe numeric cast can crash queries
Performance - CAST notation has same performance as ::

✅ Architecture & Compatibility

Local-first architecture maintained
Cross-platform - Postgres-specific code properly isolated
No unicode/emojis in output

Recommendations

1. Restore the safety guard (CRITICAL):

In postgres_search_repository.py:343-347, restore the regex guard:

if filt.op in {"gt", "gte", "lt", "lte", "between"}:
    if filt.comparison == "numeric":
        # Safe: Only cast if value matches numeric pattern
        numeric_expr = (
            f"CASE WHEN ({text_expr}) ~ '^-?\\d+(\\.\\d+)?$' "
            f"THEN ({text_expr})::double precision END"
        )
        compare_expr = numeric_expr
    else:
        compare_expr = text_expr

2. Add edge case test:

@pytest.mark.asyncio
async def test_search_metadata_filters_numeric_with_mixed_types(search_repository, session_maker):
    """Numeric filters should handle mixed-type metadata gracefully."""
    entity_numeric = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Number", {"score": 0.8}
    )
    entity_text = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Text", {"score": "N/A"}
    )
    
    # Should match only numeric entity, not crash
    results = await search_repository.search(
        metadata_filters={"score": {"$gt": 0.5}}
    )
    assert {result.id for result in results} == {entity_numeric.id}

3. What's good about this PR:

✅ Preserving numeric types (as float) in metadata_filters.py is correct
✅ Using CAST(... AS jsonb) instead of ::jsonb for bind params is fine
✅ Simplifying the code is a good goal

Verdict

Request Changes - The P1 concern represents a real regression that needs to be fixed before merging. The CASE/WHEN regex guard wasn't defensive programming—it was necessary to handle mixed-type metadata gracefully.

The improvements to metadata filter parsing are good, but removing the safety guard trades robustness for simplicity, which could cause production query failures.

Fix the numeric casting issue →

fix: stabilize metadata filters on postgres

4dbcc52

Signed-off-by: phernandez <[email protected]>

chatgpt-codex-connector bot reviewed Feb 1, 2026

View reviewed changes

phernandez changed the title ~~Fix metadata filters for Postgres~~ fix: stabilize metadata filters on postgres Feb 1, 2026

test: align metadata filter expectations with numeric values

a1ba57d

Signed-off-by: phernandez <[email protected]>

phernandez merged commit 009e849 into main Feb 1, 2026
24 of 28 checks passed

phernandez deleted the fix/metadata-filter-postgres branch February 1, 2026 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize metadata filters on postgres #536

fix: stabilize metadata filters on postgres #536

Uh oh!

phernandez commented Feb 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 1, 2026

Uh oh!

github-actions bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: stabilize metadata filters on postgres #536

fix: stabilize metadata filters on postgres #536

Uh oh!

Conversation

phernandez commented Feb 1, 2026

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Fix metadata filters for Postgres

Summary of Changes

Critical Issue: P1 Concern is Valid ⚠️

Team Checklist Review

✅ Code Quality & Standards

⚠️ Testing & Documentation

✅ Architecture

⚠️ Security & Performance

✅ Compatibility

Recommendations

Verdict

Uh oh!

github-actions bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Metadata Filter Fixes for Postgres

Critical Issue: P1 Concern is Valid ⚠️

Team Checklist Review

✅ Code Quality & Standards

⚠️ Testing & Documentation

⚠️ Security & Error Handling

✅ Architecture & Compatibility

Recommendations

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 1, 2026 •

edited

Loading

github-actions bot commented Feb 1, 2026 •

edited

Loading