fix: Preserve integer values in round() for large Int64 and UInt64 inputs#22697
fix: Preserve integer values in round() for large Int64 and UInt64 inputs#22697pchintar wants to merge 1 commit into
Conversation
e565975 to
7832c0c
Compare
|
cc @neilconway |
7832c0c to
97a7294
Compare
|
@kumarUjjawal Thnx for the review! I've updated the implementation so that integer inputs now stay as integer types instead of being converted through I also added tests for the cases you mentioned:
Additionally, I updated the affected sqllogictest expectations since the plan output changed after removing the unnecessary cast to I've also rerun the relevant tests and manually verified all these cases. |
|
@kumarUjjawal could you pls re-run the CI checks? thnx again |
|
@kumarUjjawal This failure seems to be unrelated to my code. It failed before running DataFusion tests because GitHub Actions could not pull the Docker image: So this is a CI infrastructure/network failure, not a test failure perhaps? |
Yeah ignore this, it's Github issue |
So wdyt of my PR? is it good enough to be approved? or is it too early to tell? |
Will review tomorrow! |
Sure, thank you! |
| @@ -468,6 +487,11 @@ fn round_columnar( | |||
| let decimal_places_is_array = matches!(decimal_places, ColumnarValue::Array(_)); | |||
|
|
|||
| let arr: ArrayRef = match (value_array.data_type(), return_type) { | |||
There was a problem hiding this comment.
Please restore a fast path, when decimal_places is a non-negative scalar literal and input_type == return_type, return value_array directly, only fall into round_integer_array for negative/array decimal_places.
There was a problem hiding this comment.
Hi @kumarUjjawal I've restored the fast path for non-negative scalar decimal_places values while still using round_integer_array for negative and column-valued decimal_places.
Before:
(input_type, return_type)
if input_type == return_type && is_integer_data_type(return_type) =>
{
round_integer_array(value_array.as_ref(), decimal_places, return_type)?
}Now:
(input_type, return_type)
if input_type == return_type && is_integer_data_type(return_type) =>
{
match decimal_places {
ColumnarValue::Scalar(ScalarValue::Int32(Some(dp))) if *dp >= 0 => {
value_array
}
_ => round_integer_array(
value_array.as_ref(),
decimal_places,
return_type,
)?,
}
}97a7294 to
744cc36
Compare
|
@kumarUjjawal could you kindly pls re-run the checks now? thnx |
|
@kumarUjjawal all the checks seem to have passed |
|
Hi @Jefffrey could you pls check this PR and lmk if possible? thnx |
|
@pchintar let's add some tests for:
|
744cc36 to
d73d6c3
Compare
Hi @kumarUjjawal , I've added tests for all three cases in scalar.slt :
I also verified the queries manually. |
| @@ -630,6 +663,180 @@ fn round_columnar( | |||
| } | |||
| } | |||
|
|
|||
| fn round_integer_value(value: i128, decimal_places: i32) -> Result<i128, ArrowError> { | |||
There was a problem hiding this comment.
is converting all ints to i128 strictly needed for correctness, or its for ergonomics?
|
|
||
| let scalar = match return_type { | ||
| Int8 => ScalarValue::Int8(Some(i8::try_from(rounded).map_err(|_| { | ||
| ArrowError::ComputeError("Overflow while rounding Int8".to_string()) |
There was a problem hiding this comment.
these dont need to be ArrowErrors since we're return a datafusion error in this function
d73d6c3 to
29a59b9
Compare
|
Hi @Jefffrey I replaced the custom integer-type helper with |
|
Hi @Jefffrey just a kind reminder, could you pls approve this PR if everything looks fine, if not pls lmk, thanks again. |
|
Some of my comments haven't been addressed, could you please check them? |
6d96f9d to
5966609
Compare
5966609 to
c7d003c
Compare
|
Hi @Jefffrey I'm v.sorry for this late reply as I was on vacation last week actually. So you had provided me 5 suggestions earlier, I've addressed the 3 cleanup suggestions out of those 5:
For the remaining two points though:
I hope this reply helps clarify, thnx again! |
|
Hi @Jefffrey could you kindly provide any update on this? |
|
I feel the i128 conversion could be avoided with some generic trickery 🤔 For reference, I updated the benchmarks to have an i64 bench with scale 2 and with scale -2, and ran on this branch:
And to test, I modified the code to avoid the i128 conversion and just operate directly on i64: So the i128 conversion does seem to have a noticeable performance impact. We can always try to optimize in a followup if that is preference, just want to highlight this |
c7d003c to
8d15cdf
Compare
|
Hi @Jefffrey thnx for the benchmark comparison. So, I updated my integer rounding implementation to remove the |
Which issue does this PR close?
Rationale for this change
round()should not change integer values when the scale is non-negative, since no fractional digits need to be rounded.Currently, core
round()coerces largeInt64values throughFloat64, causing precision loss:Before/Current Buggy Output:
Expected:
The Spark-compatible
round()also fails forUInt64values abovei64::MAXeven when the scale is non-negative:Before/Current Buggy Output:
What changes are included in this PR?
Preserved integer inputs in core
round()for non-negative scales instead of routing them throughFloat64.Preserved
UInt64values in Spark-compatibleround()when the scale is non-negative, avoiding unnecessaryUInt64 -> i64conversion.Added SQLLogicTest coverage for:
Int64values above2^53in coreround().UInt64::MAXin Spark-compatibleround().Are these changes tested?
Yes.
I also verified the core regression queries manually:
Result:
Result:
Are there any user-facing changes?
No.