-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Create lucene doc Id mapping for offline segments when storeInSegmentFile is true #17437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for pre-building and managing Lucene docId mapping files for offline segments when storeInSegmentFile is enabled. Previously, these mappings were built on-the-fly during segment load, but now they are created during segment creation and included in the combined Pinot segment file.
Key changes:
- Pre-build docId mapping during segment creation when
storeInSegmentFileis enabled - Include docId mapping files when combining Lucene index files into the segment
- Clean up temporary mapping files after successful merge
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| LuceneTextIndexReader.java | Added timing logs and byte order handling for docId mapping buffer reads |
| LuceneTextIndexCreator.java | Added logic to build docId mapping during segment creation and cleanup after merge |
| LuceneTextIndexCombined.java | Extended file collection to include docId mapping files from segment directory |
...in/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java
Outdated
Show resolved
Hide resolved
…al/segment/creator/impl/text/LuceneTextIndexCreator.java Co-authored-by: Copilot <[email protected]>
…al/segment/creator/impl/text/LuceneTextIndexCreator.java Co-authored-by: Copilot <[email protected]>
…al/segment/creator/impl/text/LuceneTextIndexCreator.java Co-authored-by: Copilot <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17437 +/- ##
============================================
+ Coverage 63.25% 63.28% +0.03%
Complexity 1474 1474
============================================
Files 3152 3161 +9
Lines 187861 188420 +559
Branches 28761 28828 +67
============================================
+ Hits 118833 119244 +411
- Misses 59809 59942 +133
- Partials 9219 9234 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
...in/java/org/apache/pinot/segment/local/segment/creator/impl/text/LuceneTextIndexCreator.java
Show resolved
Hide resolved
...in/java/org/apache/pinot/segment/local/segment/index/readers/text/LuceneTextIndexReader.java
Show resolved
Hide resolved
xiangfu0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is ok.
Can you add more tests for text index ops:
- add new index
- update index
- remove index
- add back index
And ensure each steps all the index files are correctly added/updated/removed?
|
Thanks @xiangfu0 We do have tests to test add/update/remove and add index back. I will add more dedicated tests in next PR. |
This PR adds support for building and managing Lucene docId mapping files for offline segments when
storeInSegmentFileis enabled. The changes ensure that the docId mapping is pre-built during segment creation and properly cleaned up after being merged into the Pinot segment file.storeInSegmentFileis enabled, the docId mapping file is now built during segment creation so it's available during segment load without requiring on-the-fly construction.Testing
storeInSegmentFileis enabled