Skip to content

Lauren/event similarity#173

Open
lauren1703 wants to merge 4 commits intomainfrom
lauren/eventSimilarity
Open

Lauren/event similarity#173
lauren1703 wants to merge 4 commits intomainfrom
lauren/eventSimilarity

Conversation

@lauren1703
Copy link
Copy Markdown
Contributor

@lauren1703 lauren1703 commented Mar 25, 2026

Overview

  • Added EventPostModel to track post-event relationships with source (user/similarity/nlp_context) and relevanceScore
  • Created migration for the eventPosts table
  • Added EventPostRepository with upsert, delete, paginated query, and user-tag-priority logic
  • Added EventPostService with processEventSimilarity (computes centroid from user-tagged post embeddings, finds similar posts via pgvector cosine distance, and stores similarity rows)
  • Added EventService for fetching combined event feeds and available event tags
  • Created EventController with GET /events/available-for-tagging/ and GET /events/:eventId/posts/ endpoints
  • Created hourly cron job to run similarity processing across all event tags

Test Coverage

  • Added EventTest.test.ts covering event feed reads, pagination, source filtering, and user blocking

Summary by CodeRabbit

Release Notes

  • New Features

    • Event tagging: Tag posts with specific events
    • Event feeds: Browse and filter posts by their associated events
    • Intelligent recommendations: Automatically surface similar posts for tagged events
    • Advanced filtering: View posts by source (user-tagged or similarity-based recommendations)
    • Pagination support for browsing large event feeds
  • Documentation

    • API documentation updated with new event endpoints

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

This pull request introduces an event feed system with post-event tag associations, background similarity processing, and corresponding API endpoints. The implementation includes database schema, repository methods, service layers for API and background processing, API endpoints, and comprehensive integration tests covering all workflows.

Changes

Cohort / File(s) Summary
Configuration & Build
.gitignore, jest.config.js
Added dumps/ directory to Git ignore rules and configured Jest's moduleNameMapper for Firebase-admin import resolution.
Database Schema
src/migrations/1769700000000-CreateEventPost.ts
New migration creates eventPosts table with UUID primary key, foreign keys to posts and event tags, source field, optional relevance score, composite uniqueness constraint on (postId, eventTagId), and cascading deletes.
Domain Models
src/models/EventPostModel.ts, src/models/EventTagModel.ts, src/models/index.ts
New EventPostModel entity with TypeORM relations to posts and event tags; EventPostSource union type (`"user"
Repositories
src/repositories/EventPostRepository.ts, src/repositories/EventTagRepository.ts, src/repositories/PostRepository.ts, src/repositories/index.ts
New EventPostRepository with upsert, fetch, delete, and pagination methods for post-event relationships; new getAllEventTags() on EventTagRepository; new findSimilarPostsForEvent() on PostRepository using pgvector similarity search; repository integration in factory.
Service Layer
src/services/EventPostService.ts, src/services/EventService.ts
New EventPostService with processEventSimilarity() cron handler computing embedding centroids and discovering similar posts; new EventService with getAvailableEventTags() and getEventPosts() for API consumption with filtering, pagination, and user-blocking logic.
Existing Service Updates
src/services/PostService.ts
Modified createPost(), addEventTagsToPost(), and removeEventTagsFromPost() to upsert/delete EventPostModel relationships via repository methods when managing event tags.
API Layer
src/api/controllers/EventController.ts, src/api/controllers/index.ts
New EventController with two authenticated GET endpoints: /event/available-for-tagging/ and /event/{eventTagId}/posts/ supporting pagination and optional source filtering.
Server Initialization
src/app.ts
Added startEventSimilarityCron() invocation on HTTP server startup to initialize hourly background job.
Cron Job
src/cron/eventSimilarityCron.ts
New module exports startEventSimilarityCron() that schedules an hourly cron (0 * * * *) to process similarity for all event tags, handling errors per-tag and logging completion.
Type Definitions
src/types/ApiResponses.ts
New exported types: EventPostSource (re-export), PostWithSource, and GetEventPostsResponse for API responses with pagination metadata.
API Documentation
swagger.json
Added Event tag, two new authenticated GET endpoints under /event/, and component schemas for EventTag, PostWithSource, and GetEventPostsResponse.
Integration Tests
src/tests/EventTest.test.ts, src/tests/controllers/ControllerFactory.ts, src/tests/data/DatabaseConnection.ts
Comprehensive test suite covering event tag retrieval, paginated event feeds, user/blocked user filtering, post mutations with event tags, and similarity processing; factory method for controller construction; database cleanup now includes eventPosts table.

Sequence Diagrams

sequenceDiagram
    actor User
    participant API as EventController
    participant EventService
    participant EventPostRepo as EventPostRepository
    participant EventTagRepo as EventTagRepository
    participant PostRepo as PostRepository
    participant DB as Database

    User->>API: GET /event/{eventTagId}/posts?page=1&limit=10&source=user
    API->>EventService: getEventPosts(user, eventTagId, 1, 10, source)
    EventService->>EventTagRepo: findOne(eventTagId)
    EventTagRepo->>DB: Query EventTag
    DB-->>EventTagRepo: EventTag | null
    EventTagRepo-->>EventService: EventTag or throw NotFound
    
    par Parallel Fetch
        EventService->>EventPostRepo: getPostsForEvent(eventTagId, source, skip, limit)
        EventPostRepo->>DB: Query posts with joins (user, categories, eventTags)
        DB-->>EventPostRepo: EventPostModel[] with relations
        EventPostRepo-->>EventService: EventPostModel[]
    and
        EventService->>EventPostRepo: getPostCountForEvent(eventTagId, source)
        EventPostRepo->>DB: COUNT relationships
        DB-->>EventPostRepo: total count
        EventPostRepo-->>EventService: number
    end
    
    EventService->>PostRepo: getBlockedUserIds(user)
    PostRepo->>DB: Query UserBlock
    DB-->>PostRepo: blocked IDs
    PostRepo-->>EventService: string[]
    
    EventService->>EventService: Filter inactive & blocked users
    EventService->>EventService: Enrich posts with source & relevanceScore
    EventService-->>API: GetEventPostsResponse
    API-->>User: { posts[], total, page, limit }
Loading
sequenceDiagram
    participant Cron as EventSimilarityCron
    participant EventPostService
    participant EventTagRepo as EventTagRepository
    participant PostRepo as PostRepository
    participant EventPostRepo as EventPostRepository
    participant DB as Database

    Cron->>Cron: 0 * * * * (hourly trigger)
    Cron->>EventTagRepo: getAllEventTags()
    EventTagRepo->>DB: SELECT * FROM event_tags ORDER BY name
    DB-->>EventTagRepo: EventTagModel[]
    EventTagRepo-->>Cron: tags

    loop For each EventTag
        Cron->>EventPostService: processEventSimilarity(tagId)
        
        EventPostService->>EventPostRepo: getUserTaggedPostsForEvent(tagId)
        EventPostRepo->>DB: Query user-sourced eventPosts
        DB-->>EventPostRepo: EventPostModel[]
        EventPostRepo-->>EventPostService: relationships
        
        Note over EventPostService: Exit if no user posts or no embeddings
        
        EventPostService->>EventPostService: computeCentroid(embeddings[])
        
        EventPostService->>PostRepo: findSimilarPostsForEvent(centroid, threshold, limit)
        PostRepo->>DB: pgvector similarity search (1 - (embedding <=> centroid))
        DB-->>PostRepo: { post, score }[]
        PostRepo-->>EventPostService: similar posts with scores
        
        EventPostService->>EventPostRepo: deleteRelationshipsBySourceForEvent(tagId, "similarity")
        EventPostRepo->>DB: DELETE eventPosts WHERE source='similarity'
        DB-->>EventPostRepo: ✓
        
        loop For each similar post
            EventPostService->>EventPostRepo: upsertRelationship(postId, tagId, "similarity", score)
            EventPostRepo->>DB: INSERT/UPDATE eventPost
            DB-->>EventPostRepo: EventPostModel
        end
        
        EventPostService-->>Cron: ✓ Processed
    end
    
    Cron->>Cron: Log completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 Hoppy times with events bright,
Feed the posts, similarity's light!
Cron jobs tick, embeddings blend,
Event tags now make new friends!
A-hop-skip through the data sphere!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Lauren/event similarity' refers to a feature branch name rather than describing the actual changes made in the PR. Replace the title with a clear, descriptive summary of the main change, such as 'Add event similarity detection and event feed endpoints' or 'Implement event post tracking with ML-powered similarity matching'.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The PR description covers main changes and test coverage but is missing several required template sections including Changes Made details, how the feature works, repro steps, and how to enable the feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch lauren/eventSimilarity

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
src/app.ts (1)

29-29: Consider single-run protection for event similarity cron startup.

Line 203 starts the cron unconditionally on each app instance. In multi-replica deployments, this can trigger duplicate hourly jobs and unnecessary DB load. Consider gating with an env flag or leader/lock-based scheduling.

Also applies to: 203-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app.ts` at line 29, The cron is being started unconditionally via
startEventSimilarityCron which causes duplicate jobs in multi-replica
deployments; modify the startup to guard the call to startEventSimilarityCron by
either (A) honoring an environment flag (e.g., ENABLE_EVENT_SIMILARITY_CRON) and
only calling startEventSimilarityCron when that flag is true, or (B) acquiring a
leader/distributed lock (Redis/DB advisory lock) at startup and only calling
startEventSimilarityCron if the lock is obtained; update startup logic around
startEventSimilarityCron to check the chosen guard and log whether the instance
is running the cron or skipping it.
src/repositories/EventPostRepository.ts (2)

41-48: Hardcoded string 'user' should use the EventPostSource type.

Line 46 uses a hardcoded string 'user' instead of referencing a constant. While functionally correct, using a typed constant would be more maintainable.

♻️ Suggested fix
-      .andWhere("epr.source = 'user'")
+      .andWhere("epr.source = :source", { source: 'user' as EventPostSource })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repositories/EventPostRepository.ts` around lines 41 - 48, The query in
getUserTaggedPostsForEvent uses the hardcoded string 'user'; replace it with the
typed constant from EventPostSource to improve maintainability: import
EventPostSource and change the .andWhere clause to use a parameterized value
(e.g. .andWhere("epr.source = :source", { source: EventPostSource.User })) so
the code references EventPostSource instead of the literal string.

79-84: Remove redundant type annotations.

ESLint correctly flags that skip: number = 0 and limit: number = 10 have types that can be trivially inferred from the default values.

♻️ Proposed fix
 public async getPostsForEvent(
   eventTagId: string,
   source?: EventPostSource,
-  skip: number = 0,
-  limit: number = 10,
+  skip = 0,
+  limit = 10,
 ): Promise<EventPostModel[]> {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repositories/EventPostRepository.ts` around lines 79 - 84, The method
signature for getPostsForEvent in EventPostRepository.ts has redundant type
annotations on parameters skip: number = 0 and limit: number = 10; remove the
explicit ": number" annotations so the types are inferred from the default
values (i.e., change to skip = 0, limit = 10) while keeping the parameter names,
defaults, return type Promise<EventPostModel[]>, and the rest of the method
intact; locate the signature in the getPostsForEvent method and update it
accordingly.
src/services/EventPostService.ts (3)

87-98: Missing dimension validation in computeCentroid.

If embeddings have mismatched dimensions, the loop at line 92-94 will silently produce incorrect results (shorter embeddings won't contribute to higher indices). While the caller filters to posts with embeddings, there's no guarantee they all have the same dimension.

🛡️ Proposed defensive check
 private computeCentroid(embeddings: number[][]): number[] {
   const dim = embeddings[0].length;
+  if (embeddings.some(e => e.length !== dim)) {
+    throw new Error(`Embedding dimension mismatch: expected ${dim}`);
+  }
   const centroid = new Array(dim).fill(0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/EventPostService.ts` around lines 87 - 98, computeCentroid lacks
validation for embedding dimensionality which allows mismatched-length arrays to
silently produce wrong results; update computeCentroid to first assert
embeddings is non-empty, capture const dim = embeddings[0].length, then iterate
embeddings and throw a descriptive Error (or return a safe failure) if any
embedding.length !== dim (include the offending index and length in the
message), only then proceed to sum and average so all vectors contribute
correctly.

24-26: Test environment skip pattern may hide integration bugs.

Skipping the entire method when NODE_ENV === "test" prevents unit testing of this logic. The test file works around this by temporarily overriding NODE_ENV, but this pattern is fragile.

Consider injecting a configuration flag or making the skip behavior configurable to allow proper testing without environment manipulation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/EventPostService.ts` around lines 24 - 26, The method
processEventSimilarity in EventPostService currently returns early based on
process.env.NODE_ENV === "test"; change this to use an injectable/configurable
flag so tests can control the behavior without mutating NODE_ENV: add a
constructor parameter (e.g., skipEventSimilarityInTest: boolean or a
ConfigService boolean like config.skipEventSimilarity) stored on the
EventPostService instance (or read from ConfigService), defaulting to the
existing NODE_ENV check for backwards compatibility, then replace the direct
process.env check in processEventSimilarity with a check against that injected
flag and update callers/tests to pass the desired value.

71-78: Sequential upserts could be batched for performance.

The loop issues individual upsertRelationship calls sequentially. For large result sets (up to MAX_RESULTS = 100), this creates many round trips. Consider batching these operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/EventPostService.ts` around lines 71 - 78, The loop in
EventPostService that iterates over similarPosts and calls
eventPostRepo.upsertRelationship sequentially (up to MAX_RESULTS) causes many
round trips; change this to batch the operations by adding a repository method
(e.g., eventPostRepo.upsertRelationshipsBulk or upsertRelationshipBatch) that
accepts an array of {postId, eventTagId, type, score} and performs a single bulk
upsert or transaction, or if DB/binding lacks bulk support, run the existing
upserts in parallel with controlled concurrency (Promise.all with a p-limit) to
avoid sequential awaits; update the code in the similarPosts handling block to
prepare the array of relationships from similarPosts and call the new
bulk/parallel mechanism instead of awaiting each
eventPostRepo.upsertRelationship inside the for loop.
src/tests/EventTest.test.ts (1)

479-481: Type assertion suggests a type mismatch.

The source: "similarity" as any cast indicates TypeORM's find options may not properly recognize the EventPostSource literal type. This works but is a code smell.

Consider defining a type-safe query interface or using query builder for better type safety.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tests/EventTest.test.ts` around lines 479 - 481, The test is using a
loose cast "similarity" as any when calling conn.manager.find for
EventPostModel; replace the unsafe any cast with a type-safe value by using the
actual EventPostSource literal/enum or a proper typed query: ensure the string
"similarity" is typed as EventPostSource (or import/construct the enum/union
used by EventPostModel) when calling conn.manager.find (the simRows lookup), or
switch to conn.createQueryBuilder/EventPostModel with an explicit where clause
to avoid the type assertion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/repositories/PostRepository.ts`:
- Around line 555-572: In findSimilarPostsForEvent add a guard to exclude empty
embeddings before casting by ensuring array_length(post.embedding,1) = 512 (or
>0 / =512 as appropriate) in the WHERE clauses that use CAST(... AS vector(512))
so empty arrays ([]) are filtered out; update the scoreExpr usage and the
.andWhere chain in the method to include this array_length check alongside
"post.embedding IS NOT NULL". Apply the same array_length guard to the other
vector-distance queries: PostRepository.findSimilarPosts (both places it
builds/uses the vector cast) and the second ordering clause in
PostRepository.findSimilarPostsForEvent, and to
RequestRepository.findSimilarRequests so pgvector never receives an empty array
and avoids the dimension mismatch error.

In `@src/services/EventService.ts`:
- Around line 41-68: The current code applies active/blocked filtering after
calling eventPostRepo.getPostsForEvent and getPostCountForEvent, which causes
pagination and total to include rows that should be hidden; modify the repo
calls so filtering happens in the DB layer: retrieve blocked UIDs from
userRepo.getUserWithBlockedInfo(user.firebaseUid) first, then call
eventPostRepo.getPostsForEvent(eventTagId, source, skip, limit, {excludeUids:
blockedUids, requireActiveUser: true}) and
eventPostRepo.getPostCountForEvent(eventTagId, source, {excludeUids:
blockedUids, requireActiveUser: true}) (or add equivalent parameters/overloads)
so both the SELECT and COUNT apply the same predicates; remove the post-level
isActive/blocked filtering in EventService and keep only mapping to
PostWithSource for attaching source/relevanceScore.

In `@src/services/PostService.ts`:
- Around line 553-562: When removing the manual event tags in PostService (use
removeEventTagsRequest.eventTags, removedTags,
eventPostRepository.deleteRelationship and post.eventTags), ensure you also
clear any leftover similarity rows when the last user-tagged seed for that event
is gone: after deleting each (post.id, tag.id) and updating post.eventTags,
check if there are no remaining user-tagged posts or embeddings for the event
and then call the repository method that removes similarity rows
(eventPostRepository.deleteRelationshipsBySourceForEvent or the EventPostService
equivalent) for that event/source before saving the post; this mirrors the
cleanup logic found in EventPostService.ts (deleteRelationshipsBySourceForEvent)
to prevent stale event matches.

---

Nitpick comments:
In `@src/app.ts`:
- Line 29: The cron is being started unconditionally via
startEventSimilarityCron which causes duplicate jobs in multi-replica
deployments; modify the startup to guard the call to startEventSimilarityCron by
either (A) honoring an environment flag (e.g., ENABLE_EVENT_SIMILARITY_CRON) and
only calling startEventSimilarityCron when that flag is true, or (B) acquiring a
leader/distributed lock (Redis/DB advisory lock) at startup and only calling
startEventSimilarityCron if the lock is obtained; update startup logic around
startEventSimilarityCron to check the chosen guard and log whether the instance
is running the cron or skipping it.

In `@src/repositories/EventPostRepository.ts`:
- Around line 41-48: The query in getUserTaggedPostsForEvent uses the hardcoded
string 'user'; replace it with the typed constant from EventPostSource to
improve maintainability: import EventPostSource and change the .andWhere clause
to use a parameterized value (e.g. .andWhere("epr.source = :source", { source:
EventPostSource.User })) so the code references EventPostSource instead of the
literal string.
- Around line 79-84: The method signature for getPostsForEvent in
EventPostRepository.ts has redundant type annotations on parameters skip: number
= 0 and limit: number = 10; remove the explicit ": number" annotations so the
types are inferred from the default values (i.e., change to skip = 0, limit =
10) while keeping the parameter names, defaults, return type
Promise<EventPostModel[]>, and the rest of the method intact; locate the
signature in the getPostsForEvent method and update it accordingly.

In `@src/services/EventPostService.ts`:
- Around line 87-98: computeCentroid lacks validation for embedding
dimensionality which allows mismatched-length arrays to silently produce wrong
results; update computeCentroid to first assert embeddings is non-empty, capture
const dim = embeddings[0].length, then iterate embeddings and throw a
descriptive Error (or return a safe failure) if any embedding.length !== dim
(include the offending index and length in the message), only then proceed to
sum and average so all vectors contribute correctly.
- Around line 24-26: The method processEventSimilarity in EventPostService
currently returns early based on process.env.NODE_ENV === "test"; change this to
use an injectable/configurable flag so tests can control the behavior without
mutating NODE_ENV: add a constructor parameter (e.g., skipEventSimilarityInTest:
boolean or a ConfigService boolean like config.skipEventSimilarity) stored on
the EventPostService instance (or read from ConfigService), defaulting to the
existing NODE_ENV check for backwards compatibility, then replace the direct
process.env check in processEventSimilarity with a check against that injected
flag and update callers/tests to pass the desired value.
- Around line 71-78: The loop in EventPostService that iterates over
similarPosts and calls eventPostRepo.upsertRelationship sequentially (up to
MAX_RESULTS) causes many round trips; change this to batch the operations by
adding a repository method (e.g., eventPostRepo.upsertRelationshipsBulk or
upsertRelationshipBatch) that accepts an array of {postId, eventTagId, type,
score} and performs a single bulk upsert or transaction, or if DB/binding lacks
bulk support, run the existing upserts in parallel with controlled concurrency
(Promise.all with a p-limit) to avoid sequential awaits; update the code in the
similarPosts handling block to prepare the array of relationships from
similarPosts and call the new bulk/parallel mechanism instead of awaiting each
eventPostRepo.upsertRelationship inside the for loop.

In `@src/tests/EventTest.test.ts`:
- Around line 479-481: The test is using a loose cast "similarity" as any when
calling conn.manager.find for EventPostModel; replace the unsafe any cast with a
type-safe value by using the actual EventPostSource literal/enum or a proper
typed query: ensure the string "similarity" is typed as EventPostSource (or
import/construct the enum/union used by EventPostModel) when calling
conn.manager.find (the simRows lookup), or switch to
conn.createQueryBuilder/EventPostModel with an explicit where clause to avoid
the type assertion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08335a9d-5164-42c9-bf02-4aacc68f66fc

📥 Commits

Reviewing files that changed from the base of the PR and between fd97ef7 and b22aa9c.

📒 Files selected for processing (22)
  • .gitignore
  • jest.config.js
  • src/api/controllers/EventController.ts
  • src/api/controllers/index.ts
  • src/app.ts
  • src/cron/eventSimilarityCron.ts
  • src/migrations/1769700000000-CreateEventPost.ts
  • src/models/EventPostModel.ts
  • src/models/EventTagModel.ts
  • src/models/index.ts
  • src/repositories/EventPostRepository.ts
  • src/repositories/EventTagRepository.ts
  • src/repositories/PostRepository.ts
  • src/repositories/index.ts
  • src/services/EventPostService.ts
  • src/services/EventService.ts
  • src/services/PostService.ts
  • src/tests/EventTest.test.ts
  • src/tests/controllers/ControllerFactory.ts
  • src/tests/data/DatabaseConnection.ts
  • src/types/ApiResponses.ts
  • swagger.json

Comment on lines +555 to +572
public async findSimilarPostsForEvent(
centroidEmbedding: number[],
excludePostIds: string[],
threshold: number,
limit: number = 100,
): Promise<{ post: PostModel; score: number }[]> {
const lit = `[${centroidEmbedding.join(",")}]`;
const scoreExpr = `(1 - (post.embedding::vector <=> CAST('${lit}' AS vector(512))))`;

// get IDs and scores with threshold filter + pagination
const qb = this.repository
.createQueryBuilder("post")
.select("post.id", "id")
.addSelect(scoreExpr, "score")
.where("post.embedding IS NOT NULL")
.andWhere("post.archive = false")
.andWhere("post.sold = false")
.andWhere(`${scoreExpr} >= :threshold`, { threshold });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's check the PostService.ts file around line 117
head -n 130 src/services/PostService.ts | tail -n 25

Repository: cuappdev/resell-backend

Length of output: 1079


🏁 Script executed:

# Let's also check the full PostRepository method
sed -n '555,610p' src/repositories/PostRepository.ts

Repository: cuappdev/resell-backend

Length of output: 2059


🏁 Script executed:

# Check if there are any other references to failed embeddings
rg -A 3 -B 3 "embedding.*\[\]" src/services/PostService.ts

Repository: cuappdev/resell-backend

Length of output: 1088


🏁 Script executed:

# Let me verify pgvector behavior with empty arrays and dimension mismatches
# First, check if there's any pgvector-related documentation or version info
rg -i "pgvector|vector" package.json

Repository: cuappdev/resell-backend

Length of output: 92


🏁 Script executed:

# Check if there are database migrations or schema definitions
fd -t f "migration\|schema" --type f | head -20

Repository: cuappdev/resell-backend

Length of output: 49


🏁 Script executed:

# Look for any test files or examples showing pgvector usage
rg -l "embedding.*vector" --type ts --type js | head -10

Repository: cuappdev/resell-backend

Length of output: 208


🏁 Script executed:

# Check the pgvector migration to see the schema
cat src/migrations/1740007049335-pgvector.ts

Repository: cuappdev/resell-backend

Length of output: 1328


🏁 Script executed:

# Look for RequestRepository to see how it uses embeddings
grep -A 20 "embedding" src/repositories/RequestRepository.ts | head -40

Repository: cuappdev/resell-backend

Length of output: 1286


🏁 Script executed:

# Get the full findSimilarRequests method from RequestRepository
sed -n '/findSimilarRequests/,/^  public async/p' src/repositories/RequestRepository.ts | head -40

Repository: cuappdev/resell-backend

Length of output: 1054


🏁 Script executed:

# Check the exact dimensions being used - search for vector casting patterns
rg "vector\([0-9]+\)" src/

Repository: cuappdev/resell-backend

Length of output: 719


🏁 Script executed:

# Let's verify the exact behavior - search for how embeddings are used elsewhere
rg -B 5 -A 5 "embedding.*vector" src/ | head -50

Repository: cuappdev/resell-backend

Length of output: 3780


🏁 Script executed:

# Let me verify the array_length behavior and confirm the exact behavior
# Check if there are any array operations on embeddings
rg "array_length" src/

Repository: cuappdev/resell-backend

Length of output: 49


🏁 Script executed:

# Let's check if there are any other places where empty embeddings might cause issues
rg -B 3 -A 3 "embedding.*\[\]|\\[\\].*embedding" src/ | head -60

Repository: cuappdev/resell-backend

Length of output: 3069


🏁 Script executed:

# Let me search for how the embedding centroid is computed to understand the dimensions
rg -B 5 "centroidEmbedding" src/ | head -50

Repository: cuappdev/resell-backend

Length of output: 873


🏁 Script executed:

# Check SearchService to see how it handles empty embeddings
sed -n '/let embedding = "\[\]"/,/catch/p' src/services/SearchService.ts

Repository: cuappdev/resell-backend

Length of output: 1744


🏁 Script executed:

# Let me check where findSimilarPostsForEvent is called to understand the centroid computation
rg "findSimilarPostsForEvent" src/ -B 2 -A 2

Repository: cuappdev/resell-backend

Length of output: 763


🏁 Script executed:

# Let me also check if centroidEmbedding could have variable dimensions
rg -B 10 "findSimilarPostsForEvent" src/services/ | head -40

Repository: cuappdev/resell-backend

Length of output: 839


🏁 Script executed:

# Find the computeCentroid method to understand dimensions
rg -A 15 "computeCentroid" src/services/EventPostService.ts

Repository: cuappdev/resell-backend

Length of output: 963


🏁 Script executed:

# Also let's check the postsWithEmbeddings filtering
rg -B 10 "postsWithEmbeddings.length === 0" src/services/EventPostService.ts | head -30

Repository: cuappdev/resell-backend

Length of output: 450


🏁 Script executed:

# Let me verify the exact error that would occur with empty embeddings
# Check if there are any error handling or logs related to dimension mismatches
rg "dimension|mismatch" src/ -i

Repository: cuappdev/resell-backend

Length of output: 241


🏁 Script executed:

# Let me also check if other similar vector operations handle empty arrays
rg -B 3 -A 3 "::vector" src/repositories/ | head -60

Repository: cuappdev/resell-backend

Length of output: 2816


🌐 Web query:

pgvector empty array cast vector dimension mismatch PostgreSQL error

💡 Result:

The PostgreSQL error "different vector dimensions" in pgvector occurs when attempting operations (e.g., distance calculations like <->, <=> ) between vectors of mismatched dimensions, such as a stored vector in a fixed-dimension column like vector(1536) and a query vector with fewer elements. Casting an empty array like '[]' or ARRAY[]::vector(N) to a typed vector(N) fails with a dimension mismatch (0 vs N) because pgvector requires exactly N finite float elements; empty arrays have 0 dimensions. To represent a zero vector (all elements 0.0), use '[0:1536]' or generate an array of N zeros, e.g., array_fill(0.0::real, ARRAY[N])::vector(N). For variable-dimension columns (vector without typmod), filter by vector_dims(vec) = N before distance ops to avoid mismatches. Use NULL for no vector instead of empty arrays, as pgvector vectors cannot contain NULL/NaN/Inf elements.

Citations:


Add array_length guard to exclude empty embeddings before vector cast.

Empty embeddings are stored as [] (e.g., from failed computations in PostService.ts), and PostgreSQL's IS NOT NULL check includes empty arrays. When the query attempts to cast an empty array to vector(512), pgvector fails with a dimension mismatch error (0 vs 512), crashing the entire query.

Required fix
   public async findSimilarPostsForEvent(
     centroidEmbedding: number[],
     excludePostIds: string[],
     threshold: number,
     limit: number = 100,
   ): Promise<{ post: PostModel; score: number }[]> {
     const lit = `[${centroidEmbedding.join(",")}]`;
     const scoreExpr = `(1 - (post.embedding::vector <=> CAST('${lit}' AS vector(512))))`;

     const qb = this.repository
       .createQueryBuilder("post")
       .select("post.id", "id")
       .addSelect(scoreExpr, "score")
       .where("post.embedding IS NOT NULL")
+      .andWhere("array_length(post.embedding, 1) = 512")
       .andWhere("post.archive = false")
       .andWhere("post.sold = false")
       .andWhere(`${scoreExpr} >= :threshold`, { threshold });

Also apply this same guard to other vector distance queries in PostRepository (findSimilarPosts, findSimilarPostsForEvent in the second ordering clause) and RequestRepository (findSimilarRequests).

🧰 Tools
🪛 ESLint

[error] 559-559: Type number trivially inferred from a number literal, remove type annotation.

(@typescript-eslint/no-inferrable-types)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/repositories/PostRepository.ts` around lines 555 - 572, In
findSimilarPostsForEvent add a guard to exclude empty embeddings before casting
by ensuring array_length(post.embedding,1) = 512 (or >0 / =512 as appropriate)
in the WHERE clauses that use CAST(... AS vector(512)) so empty arrays ([]) are
filtered out; update the scoreExpr usage and the .andWhere chain in the method
to include this array_length check alongside "post.embedding IS NOT NULL". Apply
the same array_length guard to the other vector-distance queries:
PostRepository.findSimilarPosts (both places it builds/uses the vector cast) and
the second ordering clause in PostRepository.findSimilarPostsForEvent, and to
RequestRepository.findSimilarRequests so pgvector never receives an empty array
and avoids the dimension mismatch error.

Comment on lines +41 to +68
const skip = (page - 1) * limit;

// Fetch paginated relationships and total count in parallel
const [relationships, total] = await Promise.all([
eventPostRepo.getPostsForEvent(eventTagId, source, skip, limit),
eventPostRepo.getPostCountForEvent(eventTagId, source),
]);

// Filter inactive and blocked users
const userWithBlockedInfo = await userRepo.getUserWithBlockedInfo(user.firebaseUid);
const blockedUids = new Set(
userWithBlockedInfo?.blocking?.map(u => u.firebaseUid) ?? [],
);

const filtered = relationships.filter(r =>
r.post?.user?.isActive &&
!blockedUids.has(r.post.user.firebaseUid),
);

// Attach source + relevanceScore to each post
const posts: PostWithSource[] = filtered.map(r =>
Object.assign(r.post, {
source: r.source,
relevanceScore: r.relevanceScore,
}),
);

return { posts, total, page, limit };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Apply visibility filters before paginating and counting.

getPostsForEvent() and getPostCountForEvent() run before blocked and inactive authors are removed, so hidden rows still consume skip/limit slots and total can advertise pages the caller can never fill. Push the active/block predicates into the repository query/count instead of filtering after pagination here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/EventService.ts` around lines 41 - 68, The current code applies
active/blocked filtering after calling eventPostRepo.getPostsForEvent and
getPostCountForEvent, which causes pagination and total to include rows that
should be hidden; modify the repo calls so filtering happens in the DB layer:
retrieve blocked UIDs from userRepo.getUserWithBlockedInfo(user.firebaseUid)
first, then call eventPostRepo.getPostsForEvent(eventTagId, source, skip, limit,
{excludeUids: blockedUids, requireActiveUser: true}) and
eventPostRepo.getPostCountForEvent(eventTagId, source, {excludeUids:
blockedUids, requireActiveUser: true}) (or add equivalent parameters/overloads)
so both the SELECT and COUNT apply the same predicates; remove the post-level
isActive/blocked filtering in EventService and keep only mapping to
PostWithSource for attaching source/relevanceScore.

Comment on lines +553 to 562
const tagsToRemoveNames = new Set(removeEventTagsRequest.eventTags);
const removedTags = post.eventTags?.filter(tag => tagsToRemoveNames.has(tag.name)) || [];

const eventPostRepository = Repositories.eventPost(transactionalEntityManager);
for (const tag of removedTags) {
await eventPostRepository.deleteRelationship(post.id, tag.id);
}

post.eventTags = post.eventTags?.filter(tag => !tagsToRemoveNames.has(tag.name)) || [];
return await postRepository.savePost(post);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear similarity rows when the last manual tag disappears.

This only deletes the direct (postId, eventTagId) row. If that was the last user-tagged seed for the event, the remaining similarity rows stay behind; src/services/EventPostService.ts Lines 29-41 and 56-60 currently return before deleteRelationshipsBySourceForEvent(...) when no user-tagged posts or embeddings remain. The feed can keep showing stale event matches indefinitely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/PostService.ts` around lines 553 - 562, When removing the manual
event tags in PostService (use removeEventTagsRequest.eventTags, removedTags,
eventPostRepository.deleteRelationship and post.eventTags), ensure you also
clear any leftover similarity rows when the last user-tagged seed for that event
is gone: after deleting each (post.id, tag.id) and updating post.eventTags,
check if there are no remaining user-tagged posts or embeddings for the event
and then call the repository method that removes similarity rows
(eventPostRepository.deleteRelationshipsBySourceForEvent or the EventPostService
equivalent) for that event/source before saving the post; this mirrors the
cleanup logic found in EventPostService.ts (deleteRelationshipsBySourceForEvent)
to prevent stale event matches.

Copy link
Copy Markdown

@JoshD94 JoshD94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after coderabbit fixes are added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants