-
Notifications
You must be signed in to change notification settings - Fork 9
DOC-754 | Added and removed consolidation options for inverted indexes and arangosearch Views
#850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Deploy Preview Available Via |
| This option is available from v3.12.7 onward: | ||
| Merge a subset of segments where the ratio of the largest segment size | ||
| to the combined segment size is within this threshold. Increasing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skew describes how much do segment files vary in file size. It is a number between 0.0 and 1.0 and is calculated by dividing the largest file size of a set of segment files by the total size.
A large threshold value allows merging large segment files with smaller ones, consolidation will occur more frequently and there will be fewer segment files on disk at all times. While this may potentially improve the read performance and will result into the need for fewer file descriptors, frequent consolidations result into a higher write load resulting into a higher write amplification.
On the other hand, a small threshold value will trigger consolidation only when there are a large number of segment files that don't vary in size a lot. Consolidation will occur less frequently reducing write amplification but it can result into a greater number of segment files on disk.
Multiple combinations of candidate segments are checked and the one with the lowest skew value is selected for consolidation. The selection process picks the most number of segments that together have the lowest skew while ensuring that the size of the new consolidated segment remains under the configured segmentsBytesMax.
| calculated by dividing the number of deleted documents by the total | ||
| number of documents. | ||
| The segment files with the highest individual deletion ratio are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segment files are first sorted in the decreasing order of their individual deletion ratios. Then we look for the largest subset of segments whose collective deletion ratio is over or equal to minDeletionRatio. This subset is selected for cleanup ...
Description
Upstream PRs
Note
Documents the 3.12.7 consolidation changes by deprecating legacy tier options and adding
maxSkewThreshold/minDeletionRatio, updating OpenAPI schemas and release notes for 3.12 and 4.0.segmentsBytesFloor,segmentsMin,segmentsMax,minScoreas only available up tov3.12.6(or remove in 4.0 schemas).consolidationPolicyoptions fortier:maxSkewThreshold(0.0–1.0, default0.4) andminDeletionRatio(0.0–1.0, default0.5) to inverted indexes andarangosearchViews; include descriptions, ranges, and defaults.arangosearchView properties to document removed legacy options and new options with behavior/explanations.arangosearchViews.v3.12.7and link to updated docs.Written by Cursor Bugbot for commit d69e2b6. This will update automatically on new commits. Configure here.