GH-45937: [C++][Parquet] support to encode, write and validate variant#50252
GH-45937: [C++][Parquet] support to encode, write and validate variant#50252HuaHuaY wants to merge 2 commits into
Conversation
b186aeb to
f8d6cb9
Compare
| ::arrow::internal::Executor* executor_; | ||
|
|
||
| bool write_time_adjusted_to_utc_; | ||
| bool variant_validation_enabled_; |
There was a problem hiding this comment.
Why this belongs to ArrowWriterProperties but not WriterProperties? If users the low-level parquet writer without Arrow API, serialized variant values cannot be validated any more?
There was a problem hiding this comment.
Variant is a logical type. So its validation happens in arrow's writer.
| return field->type()->storage_id() == Type::BINARY || | ||
| field->type()->storage_id() == Type::LARGE_BINARY; | ||
|
|
||
| bool IsSupportedPrimitiveTypedValue(const std::shared_ptr<DataType>& type) { |
There was a problem hiding this comment.
It seems that some Arrow primitive types are missing from here: https://arrow.apache.org/docs/format/CanonicalExtensions.html#primitive-type-mappings
|
|
||
| struct ARROW_EXPORT VariantObjectField { | ||
| std::string_view name; | ||
| uint32_t field_id = 0; |
| uint8_t offset_size() const { return offset_size_; } | ||
| uint32_t dictionary_size() const { return static_cast<uint32_t>(strings_.size()); } | ||
|
|
||
| std::string_view string(uint32_t id) const; |
There was a problem hiding this comment.
Let's add some comment to help understand that this is the dictionary.
| } | ||
|
|
||
| Status PrimitivePayloadSize(std::string_view value, size_t offset, | ||
| VariantPrimitiveType primitive, size_t* size) { |
There was a problem hiding this comment.
Why not returning Result<size_t>?
There was a problem hiding this comment.
BTW, if we decide to move to parquet folder, we should use ParquetException instead of arrow::Result/Status.
Rationale for this change
This PR supports:
typed_valuearrayThis PR does not support:
typed_valueshredded data from thevaluearrayWhat changes are included in this PR?
Are these changes tested?
Yes.
Are there any user-facing changes?
variant_validation_enabled_inArrowWriterProperties.cpp/src/arrow/extension/variant/.