Skip to content

Conversation

@Simran-B
Copy link
Contributor

@Simran-B Simran-B commented Dec 9, 2025

Description

Upstream PRs

  • 3.10:
  • 3.11:
  • 3.12:
  • 4.0:

Note

Documents the 3.12.7 consolidation changes by deprecating legacy tier options and adding maxSkewThreshold/minDeletionRatio, updating OpenAPI schemas and release notes for 3.12 and 4.0.

  • Docs (3.12, 4.0):
    • HTTP API (OpenAPI) updates:
      • Mark segmentsBytesFloor, segmentsMin, segmentsMax, minScore as only available up to v3.12.6 (or remove in 4.0 schemas).
      • Add new consolidationPolicy options for tier: maxSkewThreshold (0.0–1.0, default 0.4) and minDeletionRatio (0.0–1.0, default 0.5) to inverted indexes and arangosearch Views; include descriptions, ranges, and defaults.
    • Reference pages:
      • Update arangosearch View properties to document removed legacy options and new options with behavior/explanations.
    • Release notes (3.12):
      • Add sections detailing removed options and new options for both inverted indexes and arangosearch Views.
      • Introduce note about the new consolidation algorithm in v3.12.7 and link to updated docs.

Written by Cursor Bugbot for commit d69e2b6. This will update automatically on new commits. Configure here.

@Simran-B Simran-B requested a review from k0ushal December 9, 2025 09:50
@Simran-B Simran-B added this to the 3.12.7 milestone Dec 9, 2025
@arangodb-docs-automation
Copy link
Contributor

Deploy Preview Available Via
https://deploy-preview-850--docs-hugo.netlify.app

@cla-bot cla-bot bot added the cla-signed label Dec 9, 2025
@Simran-B Simran-B self-assigned this Dec 9, 2025
This option is available from v3.12.7 onward:
Merge a subset of segments where the ratio of the largest segment size
to the combined segment size is within this threshold. Increasing the
Copy link

@k0ushal k0ushal Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skew describes how much do segment files vary in file size. It is a number between 0.0 and 1.0 and is calculated by dividing the largest file size of a set of segment files by the total size.

A large threshold value allows merging large segment files with smaller ones, consolidation will occur more frequently and there will be fewer segment files on disk at all times. While this may potentially improve the read performance and will result into the need for fewer file descriptors, frequent consolidations result into a higher write load resulting into a higher write amplification.
On the other hand, a small threshold value will trigger consolidation only when there are a large number of segment files that don't vary in size a lot. Consolidation will occur less frequently reducing write amplification but it can result into a greater number of segment files on disk.

Multiple combinations of candidate segments are checked and the one with the lowest skew value is selected for consolidation. The selection process picks the most number of segments that together have the lowest skew while ensuring that the size of the new consolidated segment remains under the configured segmentsBytesMax.

calculated by dividing the number of deleted documents by the total
number of documents.
The segment files with the highest individual deletion ratio are
Copy link

@k0ushal k0ushal Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segment files are first sorted in the decreasing order of their individual deletion ratios. Then we look for the largest subset of segments whose collective deletion ratio is over or equal to minDeletionRatio. This subset is selected for cleanup ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants