fix: Preserve integer values in round() for large Int64 and UInt64 inputs by pchintar · Pull Request #22697 · apache/datafusion

pchintar · 2026-06-01T12:24:57Z

Which issue does this PR close?

Closes round() mishandles large Int64 and UInt64 values #22696 .

Rationale for this change

round() should not change integer values when the scale is non-negative, since no fractional digits need to be rounded.

Currently, core round() coerces large Int64 values through Float64, causing precision loss:

SELECT round(arrow_cast(9007199254740993, 'Int64'));

Before/Current Buggy Output:

9007199254740992.0

Expected:

9007199254740993

The Spark-compatible round() also fails for UInt64 values above i64::MAX even when the scale is non-negative:

SELECT round(arrow_cast(18446744073709551615, 'UInt64'));

Before/Current Buggy Output:

round: UInt64 value 18446744073709551615 exceeds i64::MAX and cannot be rounded

What changes are included in this PR?

Preserved integer inputs in core round() for non-negative scales instead of routing them through Float64.
Preserved UInt64 values in Spark-compatible round() when the scale is non-negative, avoiding unnecessary UInt64 -> i64 conversion.
Added SQLLogicTest coverage for:
- Int64 values above 2^53 in core round().
- UInt64::MAX in Spark-compatible round().
- Both one-argument and two-argument forms.

Are these changes tested?

Yes.

cargo fmt --all
git diff --check
cargo test -p datafusion-sqllogictest --test sqllogictests -- spark/math/round.slt
cargo test -p datafusion-functions round
cargo test -p datafusion-spark round

I also verified the core regression queries manually:

SELECT arrow_typeof(round(arrow_cast(9007199254740993, 'Int64'))),
       round(arrow_cast(9007199254740993, 'Int64'));

Result:

Int64 9007199254740993

SELECT arrow_typeof(round(arrow_cast(9007199254740993, 'Int64'), 2)),
       round(arrow_cast(9007199254740993, 'Int64'), 2);

Result:

Int64 9007199254740993

Are there any user-facing changes?

No.

kumarUjjawal · 2026-06-02T06:34:21Z

cc @neilconway

pchintar · 2026-06-02T09:24:15Z

@kumarUjjawal Thnx for the review!

I've updated the implementation so that integer inputs now stay as integer types instead of being converted through Float64. This preserves large Int64 values that cannot be represented exactly as Float64.

I also added tests for the cases you mentioned:

negative decimal_places values (for example, round(125, -1))
column-valued decimal_places
large Int64 values above the exact Float64 range

Additionally, I updated the affected sqllogictest expectations since the plan output changed after removing the unnecessary cast to Float64.

I've also rerun the relevant tests and manually verified all these cases.

pchintar · 2026-06-02T10:43:49Z

@kumarUjjawal could you pls re-run the CI checks? thnx again

pchintar · 2026-06-02T10:59:13Z

@kumarUjjawal This failure seems to be unrelated to my code.

It failed before running DataFusion tests because GitHub Actions could not pull the Docker image:

docker pull amd64/rust
Get "https://registry-1.docker.io/v2/": context deadline exceeded

So this is a CI infrastructure/network failure, not a test failure perhaps?

kumarUjjawal · 2026-06-02T11:00:38Z

@kumarUjjawal This failure seems to be unrelated to my code.

It failed before running DataFusion tests because GitHub Actions could not pull the Docker image:
docker pull amd64/rust
Get "https://registry-1.docker.io/v2/": context deadline exceeded
So this is a CI infrastructure/network failure, not a test failure perhaps?

Yeah ignore this, it's Github issue

pchintar · 2026-06-02T11:03:03Z

Yeah ignore this, it's Github issue

So wdyt of my PR? is it good enough to be approved? or is it too early to tell?

kumarUjjawal · 2026-06-02T11:04:27Z

Yeah ignore this, it's Github issue

So wdyt of my PR? is it good enough to be approved? or is it too early to tell?

Will review tomorrow!

pchintar · 2026-06-02T11:05:19Z

Will review tomorrow!

Sure, thank you!

kumarUjjawal · 2026-06-03T08:50:22Z

@@ -468,6 +487,11 @@ fn round_columnar(
    let decimal_places_is_array = matches!(decimal_places, ColumnarValue::Array(_));

    let arr: ArrayRef = match (value_array.data_type(), return_type) {


Please restore a fast path, when decimal_places is a non-negative scalar literal and input_type == return_type, return value_array directly, only fall into round_integer_array for negative/array decimal_places.

Hi @kumarUjjawal I've restored the fast path for non-negative scalar decimal_places values while still using round_integer_array for negative and column-valued decimal_places.

Before:

(input_type, return_type) if input_type == return_type && is_integer_data_type(return_type) => { round_integer_array(value_array.as_ref(), decimal_places, return_type)? }

Now:

(input_type, return_type) if input_type == return_type && is_integer_data_type(return_type) => { match decimal_places { ColumnarValue::Scalar(ScalarValue::Int32(Some(dp))) if *dp >= 0 => { value_array } _ => round_integer_array( value_array.as_ref(), decimal_places, return_type, )?, } }

pchintar · 2026-06-03T10:24:19Z

@kumarUjjawal could you kindly pls re-run the checks now? thnx

pchintar · 2026-06-03T11:36:42Z

@kumarUjjawal all the checks seem to have passed

pchintar · 2026-06-06T04:20:18Z

Hi @Jefffrey could you pls check this PR and lmk if possible? thnx

kumarUjjawal · 2026-06-08T07:31:38Z

@pchintar let's add some tests for:

overflow-on-negative-scale case
unsigned 64-bit values
Rounding to a place larger than the number itself

pchintar · 2026-06-08T09:02:45Z

@pchintar let's add some tests for:

overflow-on-negative-scale case

unsigned 64-bit values

Rounding to a place larger than the number itself

Hi @kumarUjjawal , I've added tests for all three cases in scalar.slt :

overflow with Int64::MAX and negative scale
UInt64::MAX with one-arg and two-arg round
rounding 125 to a larger place (-5)

I also verified the queries manually.

Jefffrey · 2026-06-09T04:39:12Z

@@ -630,6 +663,180 @@ fn round_columnar(
    }
 }

+fn round_integer_value(value: i128, decimal_places: i32) -> Result<i128, ArrowError> {


is converting all ints to i128 strictly needed for correctness, or its for ergonomics?

Jefffrey · 2026-06-09T04:39:28Z

+
+    let scalar = match return_type {
+        Int8 => ScalarValue::Int8(Some(i8::try_from(rounded).map_err(|_| {
+            ArrowError::ComputeError("Overflow while rounding Int8".to_string())


these dont need to be ArrowErrors since we're return a datafusion error in this function

pchintar · 2026-06-09T07:02:52Z

Hi @Jefffrey I replaced the custom integer-type helper with DataType::is_integer() and restored the original UInt64 handling in the Spark implementation, since Spark doesn't expose a UInt64 type. I also reran the existing tests and manually verified the Int64/UInt64 rounding cases to make sure the behavior is unchanged.

pchintar · 2026-06-17T11:38:29Z

Hi @Jefffrey just a kind reminder, could you pls approve this PR if everything looks fine, if not pls lmk, thanks again.

Jefffrey · 2026-06-17T11:55:01Z

Some of my comments haven't been addressed, could you please check them?

pchintar · 2026-06-21T20:15:02Z

Hi @Jefffrey I'm v.sorry for this late reply as I was on vacation last week actually.

So you had provided me 5 suggestions earlier, I've addressed the 3 cleanup suggestions out of those 5:

replaced the custom integer-type helper with DataType::is_integer()
restored the original UInt64 path in the Spark implementation, since Spark doesn't expose a UInt64 type
simplified round_integer_scalar() to match on (value, return_type) directly, removing the separate input/output matches while keeping the same checked conversions and overflow behavior

For the remaining two points though:

I kept the shared i128 intermediate intentionally. It allows all integer types to use the same negative-scale rounding logic and then perform a checked conversion back to the original type, so overflow is detected consistently without duplicating the implementation for every integer width.
I also kept ArrowError in the rounding helpers. The array path goes through calculate_binary_math, whose callback is defined in terms of ArrowError, so keeping the same error type there avoids adding a separate conversion layer. The scalar path follows the same pattern for consistency.

I hope this reply helps clarify, thnx again!

pchintar · 2026-06-26T17:56:44Z

Hi @Jefffrey could you kindly provide any update on this?

Jefffrey · 2026-06-27T07:30:53Z

I feel the i128 conversion could be avoided with some generic trickery 🤔

For reference, I updated the benchmarks to have an i64 bench with scale 2 and with scale -2, and ran on this branch:

round size=1024/round_f64_array
                        time:   [813.88 ns 819.69 ns 823.61 ns]
                        change: [−2.9895% −2.2717% −1.7347%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) high mild
round size=1024/round_i64_array
                        time:   [104.98 ns 105.65 ns 106.55 ns]
                        change: [−2.5935% −1.3028% −0.0545%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
round size=1024/round_i64_array_neg_scale
                        time:   [4.4552 µs 4.4583 µs 4.4623 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

round size=4096/round_f64_array
                        time:   [2.7175 µs 2.7288 µs 2.7357 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe
round size=4096/round_i64_array
                        time:   [105.02 ns 105.41 ns 105.90 ns]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
round size=4096/round_i64_array_neg_scale
                        time:   [17.140 µs 17.144 µs 17.149 µs]

round size=8192/round_f64_array
                        time:   [5.6102 µs 5.6347 µs 5.6583 µs]
round size=8192/round_i64_array
                        time:   [104.78 ns 105.08 ns 105.37 ns]
round size=8192/round_i64_array_neg_scale
                        time:   [34.582 µs 34.604 µs 34.627 µs]

round_i64_array = scale 2, expected pass through fast since no work needed
round_i64_array_neg_scale = scale -2, needs to go through i128 path

And to test, I modified the code to avoid the i128 conversion and just operate directly on i64:

round size=1024/round_f64_array
                        time:   [810.04 ns 814.87 ns 817.61 ns]
                        change: [−1.4862% −0.5891% +0.2634%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low severe
round size=1024/round_i64_array
                        time:   [105.65 ns 105.78 ns 105.92 ns]
                        change: [−0.7381% +0.1219% +0.7794%] (p = 0.79 > 0.05)
                        No change in performance detected.
round size=1024/round_i64_array_neg_scale
                        time:   [2.3741 µs 2.3780 µs 2.3824 µs]
                        change: [−46.758% −46.661% −46.557%] (p = 0.00 < 0.05)
                        Performance has improved.

round size=4096/round_f64_array
                        time:   [2.6846 µs 2.6904 µs 2.6962 µs]
                        change: [−1.7566% −1.4088% −0.9277%] (p = 0.00 < 0.05)
                        Change within noise threshold.
round size=4096/round_i64_array
                        time:   [106.03 ns 106.31 ns 106.67 ns]
                        change: [+0.3251% +0.8590% +1.4019%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
round size=4096/round_i64_array_neg_scale
                        time:   [8.9223 µs 8.9313 µs 8.9400 µs]
                        change: [−47.965% −47.905% −47.852%] (p = 0.00 < 0.05)
                        Performance has improved.

round size=8192/round_f64_array
                        time:   [5.5491 µs 5.5718 µs 5.5949 µs]
                        change: [−1.6684% −1.1164% −0.5111%] (p = 0.00 < 0.05)
                        Change within noise threshold.
round size=8192/round_i64_array
                        time:   [109.97 ns 110.49 ns 111.02 ns]
                        change: [+4.5883% +5.1438% +5.7191%] (p = 0.00 < 0.05)
                        Performance has regressed.
round size=8192/round_i64_array_neg_scale
                        time:   [17.872 µs 17.946 µs 18.034 µs]
                        change: [−48.360% −48.138% −47.920%] (p = 0.00 < 0.05)
                        Performance has improved.

So the i128 conversion does seem to have a noticeable performance impact.

We can always try to optimize in a followup if that is preference, just want to highlight this

pchintar · 2026-06-27T13:19:55Z

Hi @Jefffrey thnx for the benchmark comparison. So, I updated my integer rounding implementation to remove the i128 conversion in the integer path. Instead of converting each integer value to i128 before rounding and then converting it back, the implementation now performs the rounding directly on the native integer type using PrimInt together with checked_pow. I also split the implementation into separate signed and unsigned paths so that the existing overflow handling and rounding behavior for each integer type are preserved, while avoiding the extra conversion overhead in the hot path.

github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation spark labels Jun 1, 2026

pchintar force-pushed the round-large-integer-handling branch from e565975 to 7832c0c Compare June 1, 2026 13:58

kumarUjjawal reviewed Jun 2, 2026

View reviewed changes

pchintar force-pushed the round-large-integer-handling branch from 7832c0c to 97a7294 Compare June 2, 2026 09:13

kumarUjjawal reviewed Jun 3, 2026

View reviewed changes

pchintar force-pushed the round-large-integer-handling branch from 97a7294 to 744cc36 Compare June 3, 2026 10:16

kumarUjjawal requested a review from Jefffrey June 3, 2026 11:38

pchintar force-pushed the round-large-integer-handling branch from 744cc36 to d73d6c3 Compare June 8, 2026 08:47

Jefffrey reviewed Jun 9, 2026

View reviewed changes

pchintar force-pushed the round-large-integer-handling branch from d73d6c3 to 29a59b9 Compare June 9, 2026 07:01

Jefffrey mentioned this pull request Jun 13, 2026

Fix round() precision loss and errors for large Int64/UInt64 values #22927

Open

Jefffrey mentioned this pull request Jun 20, 2026

fix: round large UInt64 values without narrowing #23033

Open

pchintar force-pushed the round-large-integer-handling branch 2 times, most recently from 6d96f9d to 5966609 Compare June 21, 2026 18:29

github-actions Bot added the auto detected api change Auto detected API change label Jun 21, 2026

pchintar force-pushed the round-large-integer-handling branch from 5966609 to c7d003c Compare June 21, 2026 19:45

github-actions Bot removed the auto detected api change Auto detected API change label Jun 21, 2026

Preserve integer precision in round()

8d15cdf

pchintar force-pushed the round-large-integer-handling branch from c7d003c to 8d15cdf Compare June 27, 2026 12:49

		@@ -468,6 +487,11 @@ fn round_columnar(
		let decimal_places_is_array = matches!(decimal_places, ColumnarValue::Array(_));

		let arr: ArrayRef = match (value_array.data_type(), return_type) {

Uh oh!

Conversation

pchintar commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal commented Jun 2, 2026

Uh oh!

pchintar commented Jun 2, 2026

Uh oh!

pchintar commented Jun 2, 2026

Uh oh!

pchintar commented Jun 2, 2026

Uh oh!

kumarUjjawal commented Jun 2, 2026

Uh oh!

pchintar commented Jun 2, 2026

Uh oh!

kumarUjjawal commented Jun 2, 2026

Uh oh!

pchintar commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumarUjjawal Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

pchintar Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pchintar commented Jun 3, 2026

Uh oh!

pchintar commented Jun 3, 2026

Uh oh!

pchintar commented Jun 6, 2026

Uh oh!

kumarUjjawal commented Jun 8, 2026

Uh oh!

pchintar commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jefffrey Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pchintar commented Jun 9, 2026

Uh oh!

pchintar commented Jun 17, 2026

Uh oh!

Jefffrey commented Jun 17, 2026

Uh oh!

pchintar commented Jun 21, 2026

Uh oh!

pchintar commented Jun 26, 2026

Uh oh!

Jefffrey commented Jun 27, 2026

Uh oh!

pchintar commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

pchintar commented Jun 1, 2026 •

edited

Loading

pchintar commented Jun 2, 2026 •

edited

Loading

pchintar Jun 3, 2026 •

edited

Loading

pchintar commented Jun 8, 2026 •

edited

Loading

pchintar commented Jun 27, 2026 •

edited

Loading