Skip to content

Conversation

@phernandez
Copy link
Member

Summary

  • fix Postgres metadata tag filters by casting bind params to jsonb instead of leaving named placeholders in SQL
  • simplify numeric metadata comparisons to cast JSON values to double precision; keep numeric params as floats
  • adjust metadata filter parsing to preserve numeric types for comparison operators

Testing

  • uv run pytest tests/repository/test_search_repository.py -k "metadata_filters" -q
  • BASIC_MEMORY_TEST_POSTGRES=1 uv run pytest tests/repository/test_search_repository.py -k "metadata_filters" -q

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dbcc52e3e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 342 to +345
if filt.op in {"gt", "gte", "lt", "lte", "between"}:
if filt.comparison == "numeric":
numeric_expr = (
f"CASE WHEN ({text_expr}) ~ '^-?\\\\d+(\\\\.\\\\d+)?$' "
f"THEN ({text_expr})::double precision END"
)
compare_expr = numeric_expr
else:
compare_expr = text_expr
compare_expr = (
f"({metadata_expr} #>> '{path}')::double precision"
if filt.comparison == "numeric"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore numeric-guard to avoid cast failures

Casting #>> directly to double precision will raise invalid input syntax for type double precision if any row has non-numeric text at that metadata path (e.g., "N/A"), causing the entire search query to error out even though the filter value is numeric. The previous CASE+regex avoided this by returning NULL for non-numeric values; with the new unconditional cast, numeric comparisons now fail for mixed-type metadata.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2026

Claude finished @phernandez's task in 2m 19s —— View job


PR Review: Fix metadata filters for Postgres

I've reviewed this PR against the Basic Memory team checklist. Here's my analysis:


Summary of Changes

This PR fixes Postgres metadata filter handling by:

  1. metadata_filters.py (lines 86-108): Preserving numeric types (as float) for comparison operators instead of converting them to strings via _normalize_scalar()
  2. postgres_search_repository.py (lines 322-359):
    • Replacing ::jsonb cast syntax with CAST(... AS jsonb) for better param binding
    • Simplifying numeric comparison expressions by removing regex-based guards

Critical Issue: P1 Concern is Valid ⚠️

The codex review raises a legitimate concern about line 344. The previous implementation had safety guards:

# Previous (safe):
numeric_expr = (
    f"CASE WHEN ({text_expr}) ~ '^-?\\\\d+(\\\\.\\\\d+)?$' "
    f"THEN ({text_expr})::double precision END"
)
# Current (unsafe):
compare_expr = f"({metadata_expr} #>> '{path}')::double precision"

Problem: If any entity has non-numeric text (e.g., "N/A", "unknown", "pending") in a metadata field that another entity uses numerically, the entire query will fail with:

ERROR: invalid input syntax for type double precision: "N/A"

The previous CASE WHEN ... ~ regex ... THEN cast pattern safely returned NULL for non-numeric values, allowing the query to succeed and simply exclude those rows from numeric comparisons.

Why this matters: Metadata fields can legitimately have mixed types across entities. For example:

  • Entity A: {"priority": 1}
  • Entity B: {"priority": "high"}
  • Entity C: {"priority": "N/A"}

A search for {"priority": {"$gt": 0}} would crash instead of just matching Entity A.


Team Checklist Review

✅ Code Quality & Standards

  • Python 3.12+ type annotations: Proper typing throughout
  • Async patterns: Follows SQLAlchemy 2.0 async conventions
  • Line length: All lines under 100 characters
  • Formatting: Code follows ruff formatting standards
  • Repository pattern: Changes properly isolated to repository layer

⚠️ Testing & Documentation

  • Tests exist: test_search_metadata_filters_numeric_comparisons covers basic numeric filtering
  • Edge case coverage: Missing test for mixed-type metadata fields (numeric + text in same field across entities)
  • 100% coverage: Changes are covered, but edge case may not be
  • Documentation: No updates needed for internal repository changes

✅ Architecture

  • Local-first: No impact on architecture
  • Composition: Repository pattern maintained
  • SQLAlchemy patterns: Proper use of text() queries with params

⚠️ Security & Performance

  • Input validation: Handled by parse_metadata_filters
  • Error handling: Numeric cast can fail with invalid input syntax error (regression from previous safe implementation)
  • Performance: CAST vs :: notation has no performance difference

✅ Compatibility

  • Cross-platform: Postgres-specific code properly isolated
  • No unicode/emojis: Clean code

Recommendations

  1. Restore the numeric guard (CRITICAL):

    # For postgres_search_repository.py line 343-347
    if filt.comparison == "numeric":
        # Safe: Only cast if value matches numeric regex pattern
        compare_expr = (
            f"CASE WHEN ({metadata_expr} #>> '{path}') ~ '^-?\\d+(\\.\\d+)?$' "
            f"THEN ({metadata_expr} #>> '{path}')::double precision END"
        )
    else:
        compare_expr = text_expr
  2. Add edge case test:

    async def test_search_metadata_filters_numeric_with_mixed_types(search_repository, session_maker):
        """Test numeric filters when some entities have non-numeric values."""
        entity_numeric = await _index_entity_with_metadata(
            search_repository, session_maker, "Has Number", {"score": 0.8}
        )
        entity_text = await _index_entity_with_metadata(
            search_repository, session_maker, "Has Text", {"score": "N/A"}
        )
        
        # Should match only numeric entity, not crash
        results = await search_repository.search(
            metadata_filters={"score": {"$gt": 0.5}}
        )
        assert {result.id for result in results} == {entity_numeric.id}
  3. Simplification is good, but safety matters: The CAST syntax change (line 335) is fine, but removing the regex guard traded robustness for simplicity.


Verdict

Request Changes: The P1 concern about unsafe numeric casting is valid and represents a regression. While the metadata filter parsing improvements are good, the removal of the safety guard in postgres_search_repository.py line 343-347 needs to be reverted.

The previous implementation's CASE WHEN ... ~ regex pattern was not just defensive programming—it was necessary to handle real-world scenarios where metadata fields contain mixed types across entities.

Fix this issue →


@phernandez phernandez changed the title Fix metadata filters for Postgres fix: stabilize metadata filters on postgres Feb 1, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 1, 2026

Claude finished @phernandez's task in 2m 30s —— View job


PR Review: Metadata Filter Fixes for Postgres

I've reviewed this PR against the Basic Memory team checklist and analyzed the P1 concern raised by both automated reviewers.


Critical Issue: P1 Concern is Valid ⚠️

The automated review correctly identifies a regression in error handling. Here's the problem:

What changed:

  • Previous (commit 8838571): Used CASE WHEN ... ~ regex ... THEN cast guard for safe numeric casting
  • Current (this PR): Direct cast without guard: ({metadata_expr} #>> '{path}')::double precision

Why this is a problem:

The filter parsing logic in metadata_filters.py:86-96 only checks if the filter value is numeric, not if the metadata values in the database are numeric. This means:

  1. User searches with {"priority": {"$gt": 0}} - filter value is numeric ✓
  2. Parser sets comparison = "numeric" based on filter value ✓
  3. Postgres query tries to cast ALL metadata values for "priority" field:
    • Entity A: {"priority": 1} → casts to 1.0
    • Entity B: {"priority": "high"}ERROR: invalid input syntax for type double precision: "high"

The entire query fails instead of gracefully excluding Entity B.

Real-world scenario: Metadata fields can legitimately have mixed types across entities:

  • Early entities: {"confidence": "high"} (before numeric scoring)
  • Later entities: {"confidence": 0.85} (after implementing numeric scores)

A search for {"confidence": {"$gt": 0.7}} would crash instead of matching only numeric values.


Team Checklist Review

✅ Code Quality & Standards

  • Python 3.12+ type annotations - proper typing throughout
  • Async patterns - follows SQLAlchemy 2.0 conventions
  • Line length - all under 100 characters
  • Repository pattern - changes properly isolated
  • CAST vs :: syntax - functionally equivalent

⚠️ Testing & Documentation

  • Tests exist - test_search_metadata_filters_numeric_comparisons covers basic numeric filtering
  • Missing edge case test - no test for mixed-type metadata (numeric + text in same field)
  • 100% coverage may be misleading - edge case not exercised
  • Documentation - no updates needed for internal changes

⚠️ Security & Error Handling

  • Input validation - handled by parse_metadata_filters
  • Error handling regression - unsafe numeric cast can crash queries
  • Performance - CAST notation has same performance as ::

✅ Architecture & Compatibility

  • Local-first architecture maintained
  • Cross-platform - Postgres-specific code properly isolated
  • No unicode/emojis in output

Recommendations

1. Restore the safety guard (CRITICAL):

In postgres_search_repository.py:343-347, restore the regex guard:

if filt.op in {"gt", "gte", "lt", "lte", "between"}:
    if filt.comparison == "numeric":
        # Safe: Only cast if value matches numeric pattern
        numeric_expr = (
            f"CASE WHEN ({text_expr}) ~ '^-?\\d+(\\.\\d+)?$' "
            f"THEN ({text_expr})::double precision END"
        )
        compare_expr = numeric_expr
    else:
        compare_expr = text_expr

2. Add edge case test:

@pytest.mark.asyncio
async def test_search_metadata_filters_numeric_with_mixed_types(search_repository, session_maker):
    """Numeric filters should handle mixed-type metadata gracefully."""
    entity_numeric = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Number", {"score": 0.8}
    )
    entity_text = await _index_entity_with_metadata(
        search_repository, session_maker, "Has Text", {"score": "N/A"}
    )
    
    # Should match only numeric entity, not crash
    results = await search_repository.search(
        metadata_filters={"score": {"$gt": 0.5}}
    )
    assert {result.id for result in results} == {entity_numeric.id}

3. What's good about this PR:

  • ✅ Preserving numeric types (as float) in metadata_filters.py is correct
  • ✅ Using CAST(... AS jsonb) instead of ::jsonb for bind params is fine
  • ✅ Simplifying the code is a good goal

Verdict

Request Changes - The P1 concern represents a real regression that needs to be fixed before merging. The CASE/WHEN regex guard wasn't defensive programming—it was necessary to handle mixed-type metadata gracefully.

The improvements to metadata filter parsing are good, but removing the safety guard trades robustness for simplicity, which could cause production query failures.

Fix the numeric casting issue →


@phernandez phernandez merged commit 009e849 into main Feb 1, 2026
24 of 28 checks passed
@phernandez phernandez deleted the fix/metadata-filter-postgres branch February 1, 2026 04:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants