Skip to content

GH-47252: [C++][Compute] Fix sort_indices for temporal types in arrow::Table.#50270

Open
nfrmtk wants to merge 2 commits into
apache:mainfrom
nfrmtk:fix-sort_indices-of-timestamp-keys-in-table
Open

GH-47252: [C++][Compute] Fix sort_indices for temporal types in arrow::Table.#50270
nfrmtk wants to merge 2 commits into
apache:mainfrom
nfrmtk:fix-sort_indices-of-timestamp-keys-in-table

Conversation

@nfrmtk

@nfrmtk nfrmtk commented Jun 26, 2026

Copy link
Copy Markdown

Rationale for this change

I was unable to use compute::SortIndices with timestamp type because of crash.

What changes are included in this PR?

Fix. The issue was that comparator for merging record batches was not converting timestamp type to its physical variant.
so it crashed on null pointer reference on checked_cast result

Are these changes tested?

yes

Are there any user-facing changes?

This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)
i've already provided, i suppose

@nfrmtk nfrmtk requested a review from pitrou as a code owner June 26, 2026 14:51
Copilot AI review requested due to automatic review settings June 26, 2026 14:51
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #47252 has been automatically assigned in GitHub to PR creator.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash in compute::SortIndices / Table::sort_by when sorting arrow::Table columns with temporal logical types (e.g., timestamp), by ensuring per-batch sort-key arrays are converted to their physical representation before comparator-based merges.

Changes:

  • Convert flattened per-record-batch sort-key arrays to the physical type (GetPhysicalType + GetPhysicalArray) when constructing ResolvedTableSortKey chunks.
  • Add a regression test covering Table multi-key sorting on timestamp columns across multiple JSON chunks (record batches).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
cpp/src/arrow/compute/kernels/vector_sort_internal.h Ensure table sort-key chunks use physical array types to prevent comparator downcast crashes for temporal logical types.
cpp/src/arrow/compute/kernels/vector_sort_test.cc Add regression coverage for Table sort indices on timestamp keys across multiple chunks and null placement variants.

Comment thread cpp/src/arrow/compute/kernels/vector_sort_internal.h Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 26, 2026 15:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants