Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions site/content/arangodb/3.12/develop/http-api/indexes/inverted.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,6 +567,8 @@ paths:
default: tier
segmentsBytesFloor:
description: |
This option is only available up to v3.12.6:

Defines the value (in bytes) to treat all smaller segments as equal for
consolidation selection.
type: integer
Expand All @@ -578,21 +580,74 @@ paths:
default: 8589934592
segmentsMax:
description: |
This option is only available up to v3.12.6:

The maximum number of segments that are evaluated as candidates for
consolidation.
type: integer
default: 200
segmentsMin:
description: |
This option is only available up to v3.12.6:

The minimum number of segments that are evaluated as candidates for
consolidation.
type: integer
default: 50
minScore:
description: |
This option is only available up to v3.12.6:

Filter out consolidation candidates with a score less than this.
type: integer
default: 0
maxSkewThreshold:
description: |
This option is available from v3.12.7 onward:

Merge a subset of segments where the ratio of the largest segment size
to the combined segment size is within this threshold. Increasing the
Copy link

@k0ushal k0ushal Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skew describes how much do segment files vary in file size. It is a number between 0.0 and 1.0 and is calculated by dividing the largest file size of a set of segment files by the total size.

A large threshold value allows merging large segment files with smaller ones, consolidation will occur more frequently and there will be fewer segment files on disk at all times. While this may potentially improve the read performance and will result into the need for fewer file descriptors, frequent consolidations result into a higher write load resulting into a higher write amplification.
On the other hand, a small threshold value will trigger consolidation only when there are a large number of segment files that don't vary in size a lot. Consolidation will occur less frequently reducing write amplification but it can result into a greater number of segment files on disk.

Multiple combinations of candidate segments are checked and the one with the lowest skew value is selected for consolidation. The selection process picks the most number of segments that together have the lowest skew while ensuring that the size of the new consolidated segment remains under the configured segmentsBytesMax.

threshold leads to fewer segment files and thus a potentially higher
read performance and less file descriptors but at the expense of more
frequent consolidations and thus higher write load.

The skew describes how much segment files vary in size. It is a number
between `0.0` and `1.0` and calculated by dividing the largest file size
of a set of segment files by the total size.

Multiple combinations of candidate segments are checked and the one with
the lowest skew value is selected for consolidation. This rather selects
many than few segments, but the new merged segment will be below the
configured `segmentsBytesMax`. The skew threshold prevents unnecessary
consolidation of e.g. a big segment file with a very small one, where the
cost of writing a merged segment is higher than the gain in read performance.
type: number
minimum: 0.0
maximum: 1.0
default: 0.4
minDeletionRatio:
description: |
This option is available from v3.12.7 onward:

Clean up segments where the ratio of deleted documents is at least
this high. Decreasing the minimum ratio leads to earlier consolidation
of segments with many deleted documents and thus reclamation of
disk space but causes a higher write load.

The deletion ratio is the percentage of deleted documents across one
or more segment files. It is a number between `0.0` and `1.0` and
calculated by dividing the number of deleted documents by the total
number of documents.

The segment files with the highest individual deletion ratio are
Copy link

@k0ushal k0ushal Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segment files are first sorted in the decreasing order of their individual deletion ratios. Then we look for the largest subset of segments whose collective deletion ratio is over or equal to minDeletionRatio. This subset is selected for cleanup ...

the candidates. As many as possible candidates are selected for
consolidation (in order of decreasing ratio), but the overall ratio
has to be at least `minDeletionRatio` and the new segment with the
active documents needs to be below the configured `segmentsBytesMax`.
type: integer
minimum: 0.0
maximum: 1.0
default: 0.5
writebufferIdle:
description: |
Maximum number of writers (segments) cached in the pool
Expand Down
Loading