Skip to content

Parquet: Skip parquet conversion for blocks with too many labels#7524

Open
siddarth2810 wants to merge 10 commits into
cortexproject:masterfrom
siddarth2810:add-no-convert-marker
Open

Parquet: Skip parquet conversion for blocks with too many labels#7524
siddarth2810 wants to merge 10 commits into
cortexproject:masterfrom
siddarth2810:add-no-convert-marker

Conversation

@siddarth2810

@siddarth2810 siddarth2810 commented May 18, 2026

Copy link
Copy Markdown
Contributor

What this PR does:
If a TSDB block exceeds a configurable threshold of distinct label names, the converter writes a parquet-no-convert-mark.json marker and skips the block.

  • Added no-convert marker with read/write logic
  • Added parquet-converter.max-block-label-names limit

Which issue(s) this PR fixes:
Fixes #7195

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
- Add max-block-label-names limit, blocks exceeding it get a
  no-convert marker instead of being converted.

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…correctly

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
…test

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
@siddarth2810 siddarth2810 marked this pull request as ready for review May 18, 2026 12:15
@dosubot dosubot Bot added go Pull requests that update Go code storage/blocks Blocks storage engine type/feature labels May 18, 2026
friedrichg

This comment was marked as resolved.

- Add a new cortex_parquet_converter_blocks_skipped_total counter with user and reason labels
- Extract "too_many_labels" to a constant to avoid string duplication

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>

This comment was marked as resolved.

@friedrichg friedrichg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one minor nit on the metrics that copilot suggested. pre-approving!

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 21, 2026
…ert marker exists

Signed-off-by: Siddarth Gundu <siddarthg0910@gmail.com>
@friedrichg friedrichg requested a review from yeya24 May 22, 2026 19:24

// We don't convert blocks again if they already have a valid converter mark.
if cortex_parquet.ValidConverterMarkVersion(marker.Version) {
level.Debug(logger).Log("msg", "skipping block, no-convert marker already exists", "block", b.ULID.String())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right log here?

Version int `json:"version"`
Reason string `json:"reason"`
LabelNamesCount int `json:"label_names_count"`
Threshold int `json:"threshold"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need details like LabelNamesCount and Threshold in this file? The no convert marker can be manually uploaded, too. Those details can be embeded in reason or have another string field for that

continue
}
labelNamesCount := len(labelNames)
if labelNamesCount > maxBlockLabelNames {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note. Today the max column limit in parquet go is like 32767 IIRC. But since our parquet file has additional system columns, when configuring the max block label names we need to keep some buffer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code lgtm This PR has been approved by a maintainer size/L storage/blocks Blocks storage engine type/feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] Stop converting TSDB block to parquet if it has too many labels

4 participants