support sharded parquet files in parquet converter and queryable #7189
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR tries to store the number of shards in the parquet converter marker as well as in the bucket index. This supports sharded parquet conversion (we don't support it today) in both write path and read path. Read path can tell how many shards there are by looking at the parquet marker to know how many files to read.
Note that to make this PR small I only changed parquet queryable and left parquet store gateway untouched. Ideally, Store Gateway should load bucket index so that it is able to tell how many shards there are for parquet blocks. But today parquet store gateway doesn't sync bucket index at all.
The plan is to add more shard info to the parquet convert marker like min and max metric name for each shard so that we can prune the shards to query based on the metric name as our parquet file is sorted by metric name. That can leave for future implementation.
Which issue(s) this PR fixes:
Fixes #7175
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]