Parquet: Skip parquet conversion for blocks with too many labels#7524
Open
siddarth2810 wants to merge 10 commits into
Open
Parquet: Skip parquet conversion for blocks with too many labels#7524siddarth2810 wants to merge 10 commits into
siddarth2810 wants to merge 10 commits into
Conversation
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Add max-block-label-names limit, blocks exceeding it get a no-convert marker instead of being converted. Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…correctly Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…test Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Add a new cortex_parquet_converter_blocks_skipped_total counter with user and reason labels - Extract "too_many_labels" to a constant to avoid string duplication Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
friedrichg
approved these changes
May 21, 2026
friedrichg
left a comment
Member
There was a problem hiding this comment.
just one minor nit on the metrics that copilot suggested. pre-approving!
…ert marker exists Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
yeya24
reviewed
Jun 11, 2026
|
|
||
| // We don't convert blocks again if they already have a valid converter mark. | ||
| if cortex_parquet.ValidConverterMarkVersion(marker.Version) { | ||
| level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String()) |
Contributor
There was a problem hiding this comment.
Is this the right log here?
| Version int `json:"version"` | ||
| Reason string `json:"reason"` | ||
| LabelNamesCount int `json:"label_names_count"` | ||
| Threshold int `json:"threshold"` |
Contributor
There was a problem hiding this comment.
Do we need details like LabelNamesCount and Threshold in this file? The no convert marker can be manually uploaded, too. Those details can be embeded in reason or have another string field for that
| continue | ||
| } | ||
| labelNamesCount := len(labelNames) | ||
| if labelNamesCount > maxBlockLabelNames { |
Contributor
There was a problem hiding this comment.
A note. Today the max column limit in parquet go is like 32767 IIRC. But since our parquet file has additional system columns, when configuring the max block label names we need to keep some buffer
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does:
If a TSDB block exceeds a configurable threshold of distinct label names, the converter writes a
parquet-no-convert-mark.jsonmarker and skips the block.parquet-converter.max-block-label-nameslimitWhich issue(s) this PR fixes:
Fixes #7195
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]docs/configuration/v1-guarantees.mdupdated if this PR introduces experimental flags