feat(index): add More Like This (MLT) query support by poyrazK · Pull Request #111 · poyrazK/cloudSearch

poyrazK · 2026-05-24T13:50:38Z

Summary

Add MLT query type for "find similar documents" functionality
Extract significant terms from reference document using TF*IDF scoring
Build boosted BoolQuery with must_not exclusion of source doc
Add boost field to TermQuery for per-term scoring control

Test plan

All existing tests pass
4 new MLT tests covering doc_id exclusion, like source, validation

Notes

MLT scoring requires flushed positions data. Currently requires segments to be flushed for the positions_readers to contain data. The per-doc inverted index built during indexing is not yet integrated with positions_readers.

- Add MltQuery struct and Mlt variant to SearchQuery enum - Add boost field to TermQuery for per-term scoring control - Implement build_mlt_bool_query to extract significant terms from reference doc using TF*IDF significance scoring - MLT transforms to BoolQuery before scoring; excludes source doc - Add validation for doc_id xor like (mutually exclusive) - Update all TermQuery constructors with new boost field - Add get_query_terms and score_query handling for Mlt variant

Tests verify MLT query validation and behavior with current constraints: - doc_id source exclusion works - like parameter works with raw JSON - min_term_freq filtering logic works - Neither doc_id nor like returns empty - min_doc_freq and max_query_terms constraints work Note: MLT scoring requires flushed positions data, so tests verify the query returns empty until segment flushing integrates the per-doc inverted index with positions_readers.

coderabbitai · 2026-05-24T13:50:45Z

Warning

Review limit reached

@poyrazK, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 22 minutes and 10 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c460e401-ae0f-460c-ba5f-3bb96d6a6a3d

📥 Commits

Reviewing files that changed from the base of the PR and between 9f7775b and 5df34b4.

📒 Files selected for processing (6)

rust/crates/cloudsearch-api/src/lib.rs
rust/crates/cloudsearch-api/src/query_string.rs
rust/crates/cloudsearch-common/src/lib.rs
rust/crates/cloudsearch-common/tests/round_trip.rs
rust/crates/cloudsearch-index/src/lib.rs
rust/crates/cloudsearch-index/tests/coverage.rs

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/mlt-query

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

1. Track term frequencies per-field, not globally - Changed term_freqs from HashMap<String, usize> to HashMap<String, HashMap<String, usize>> (field -> term -> tf) - Use original field when building should_clauses 2. Move source doc exclusion to search level - Removed must_not from BoolQuery (field _id may not be indexed) - Filter out mlt.doc_id directly in search() before scoring - Return doc_id_to_exclude alongside transformed query 3. Applied cargo fmt fixes

poyrazK added 2 commits May 24, 2026 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(index): add More Like This (MLT) query support#111

feat(index): add More Like This (MLT) query support#111
poyrazK wants to merge 3 commits into
mainfrom
feature/mlt-query

poyrazK commented May 24, 2026

Uh oh!

coderabbitai Bot commented May 24, 2026 •

edited

Loading

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poyrazK commented May 24, 2026

Summary

Test plan

Notes

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 24, 2026 •

edited

Loading