feat: decimal support for gcd and lcm#22655
Conversation
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@neilconway , @Jefffrey I'd appreciate hearing your thoughts on this PR |
| query error DataFusion error: Arrow error: Compute error: Signed integer overflow in GCD\(0, \-9223372036854775808\) | ||
| select gcd(0, -9223372036854775808); | ||
|
|
||
| # gcd decimal |
There was a problem hiding this comment.
could we add test cases for:
- negative number inputs
- decimals with different scale/precision inputs
- decimal point
1.23inputs
| } | ||
|
|
||
| /// Computes a binary math function for input arrays using a specified function. | ||
| /// Deprecated, use [`calculate_binary_math_cast`] instead. |
There was a problem hiding this comment.
im not sure if we should deprecate this as the calculate_binary_math_cast alternative introduces a new argument that is only really relevant for decimals 🤔
|
|
||
| /// Computes a binary math function for input arrays using a specified function | ||
| /// and applies rescaling to given precision and scale. | ||
| /// Deprecated, use [`calculate_binary_decimal_math_cast`] instead. |
There was a problem hiding this comment.
instead of just stating its deprecated might be better to explicitly mark it as so #[deprecated]
There was a problem hiding this comment.
Since the original function calculate_binary_math_cast is public, we cannot track its usage outside, and removal is only possible with deprecation. It shouldn't have been declared as public in the first place - so the new function, too.
What I can see as a plan:
- Deprecate it explicitly with the macro, as you suggest
- Port some usages in the
datafusionrepo in the next PR - Mark the new function as
pub(crate)as it is intended to be used locally only
There was a problem hiding this comment.
A second thought - marking it as deprecated fails a clippy check. Rolled back to an informational comment until the PR removes its usage in (2)
| /// - `right`: Right input array or scalar value | ||
| /// - `fun`: Function of type `F` | ||
| /// - `cast_target`: Data type to cast right operand to before applying function | ||
| pub fn calculate_binary_math_cast<L, R, O, F>( |
There was a problem hiding this comment.
should this just be a private function that the others use internally? otherwise its a bit confusing to have this as public when only decimals can really take advantage of cast_target
There was a problem hiding this comment.
That's true, made it private
Which issue does this PR close?
Rationale for this change
A binary gcd and lcm UDF in the datafusion-functions crate supports only Int64, but not Decimals. Adding missing support for decimals.
What changes are included in this PR?
common.rsto avoid inter-UDF dependencycalculate_binary_mathfor Decimals, updated it to accept a target type instead of rawDecimal128Type::DATA_TYPE- it causes scaling issues for these UDFs, see Improve scale support for binary decimal operations #19621A bit more on (4). The driving force is this failing example:
Previously in #19874, I suggested a more complicated solution to extend
calculate_binary_math. However, it only affected gcd/lcm and could be considered overkill. This PR extends these functions with an extra parametercast_targetforcalculate_binary_decimal_mathto perform a proper cast to the actual type used, rather than to the defaultDecimal128Type::DATA_TYPE- it is much lighter.Are these changes tested?
Are there any user-facing changes?
No