Skip to content

Conversation

@raghavyadav01
Copy link
Collaborator

This PR adds support for building and managing Lucene docId mapping files for offline segments when storeInSegmentFile is enabled. The changes ensure that the docId mapping is pre-built during segment creation and properly cleaned up after being merged into the Pinot segment file.

  • Pre-build docId mapping during segment creation: When storeInSegmentFile is enabled, the docId mapping file is now built during segment creation so it's available during segment load without requiring on-the-fly construction.
  • Cleanup temporary mapping file: After the docIdMapping file has been successfully merged into the combined Pinot segment file, the temporary mapping file is deleted to avoid leaving orphaned files in the segment directory.

Testing

  • Verified that docId mapping is built correctly when storeInSegmentFile is enabled
  • Confirmed that mapping files are properly included when combining Lucene index files
  • Validated that temporary mapping files are cleaned up after merging

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for pre-building and managing Lucene docId mapping files for offline segments when storeInSegmentFile is enabled. Previously, these mappings were built on-the-fly during segment load, but now they are created during segment creation and included in the combined Pinot segment file.

Key changes:

  • Pre-build docId mapping during segment creation when storeInSegmentFile is enabled
  • Include docId mapping files when combining Lucene index files into the segment
  • Clean up temporary mapping files after successful merge

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
LuceneTextIndexReader.java Added timing logs and byte order handling for docId mapping buffer reads
LuceneTextIndexCreator.java Added logic to build docId mapping during segment creation and cleanup after merge
LuceneTextIndexCombined.java Extended file collection to include docId mapping files from segment directory

raghavyadav01 and others added 4 commits December 29, 2025 10:11
…al/segment/creator/impl/text/LuceneTextIndexCreator.java

Co-authored-by: Copilot <[email protected]>
…al/segment/creator/impl/text/LuceneTextIndexCreator.java

Co-authored-by: Copilot <[email protected]>
…al/segment/creator/impl/text/LuceneTextIndexCreator.java

Co-authored-by: Copilot <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Dec 29, 2025

Codecov Report

❌ Patch coverage is 72.72727% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.28%. Comparing base (bdd8347) to head (970f8ef).
⚠️ Report is 29 commits behind head on master.

Files with missing lines Patch % Lines
...ent/creator/impl/text/LuceneTextIndexCombined.java 63.63% 2 Missing and 2 partials ⚠️
...ment/creator/impl/text/LuceneTextIndexCreator.java 78.57% 2 Missing and 1 partial ⚠️
...ment/index/readers/text/LuceneTextIndexReader.java 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17437      +/-   ##
============================================
+ Coverage     63.25%   63.28%   +0.03%     
  Complexity     1474     1474              
============================================
  Files          3152     3161       +9     
  Lines        187861   188420     +559     
  Branches      28761    28828      +67     
============================================
+ Hits         118833   119244     +411     
- Misses        59809    59942     +133     
- Partials       9219     9234      +15     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.25% <72.72%> (+0.06%) ⬆️
java-21 63.23% <72.72%> (+<0.01%) ⬆️
temurin 63.28% <72.72%> (+0.03%) ⬆️
unittests 63.28% <72.72%> (+0.03%) ⬆️
unittests1 55.64% <66.66%> (-0.01%) ⬇️
unittests2 34.00% <72.72%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 requested a review from Copilot December 30, 2025 03:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is ok.
Can you add more tests for text index ops:

  1. add new index
  2. update index
  3. remove index
  4. add back index
    And ensure each steps all the index files are correctly added/updated/removed?

@xiangfu0 xiangfu0 merged commit 668bf02 into apache:master Jan 2, 2026
18 checks passed
@raghavyadav01
Copy link
Collaborator Author

Thanks @xiangfu0 We do have tests to test add/update/remove and add index back. I will add more dedicated tests in next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants