-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
#!/usr/bin/env -S uv run --script
#
# /// script
# requires-python = ">=3.14"
# dependencies = [
# # "pandas<3",
# "pandas==3.0.0rc0",
# ]
# ///
import datetime
import pandas as pd
print("Pandas version:", pd.__version__)
df = pd.DataFrame(
{
"A": pd.date_range("20230101", periods=3),
"days_to_add": [1, 2, 3],
}
)
print(df)
print(df.dtypes)
print("adding days via datetime.timedelta")
df_via_datetime = df.copy()
df_via_datetime["A"] = df["A"] + df["days_to_add"].apply(
lambda d: datetime.timedelta(days=d)
)
print(df_via_datetime)
print(df_via_datetime.dtypes)
print("adding days via pd.Timedelta")
df_via_timedelta = df.copy()
df_via_timedelta["A"] = df["A"] + pd.to_timedelta(df["days_to_add"], unit="d")
print(df_via_timedelta)
print(df_via_timedelta.dtypes)
pd.testing.assert_frame_equal(df_via_datetime, df_via_timedelta)
print("OK, both variants work the same!")Issue Description
Trying out Pandas 3.0rc0, I get a zillion of failed tests in my test suite. Some of them expected (str / object) but most of the unexpected ones lead to the datetime change from ns to us.
Consider the script above. Why does pd.to_timedelta use ns precision? When you run the script with pandas 3rc0, the assert fails as the dtypes are different. When run it with pandas<3 instead, it passes.
With this change, suddenly pd.to_timedelta and datetime.timedelta behave differently. As a user, I find this very confusing. Why does it change to nanoseconds when I just have day granularity?
This said, in my code I am only working with daily data, so I don't really care about sub-second precision. However I would like to be able to use pd.testing.assert_frame_equal and the like and get predictable results. I also don't want to blindly ignore dtypes as float vs int can be very relevant for me.
Furthermore I have some code that works with pure integers (dates). Previously dtseries.to_numpy() did the job as all was in ns. Now I somehow need to normalize the values. How do you do that? Solutions I came up is
factor = 1000 if dtseries.dtype == '<M8[us]' else 1
call_external_func(dtseries.astype("int64").to_numpy() * factor)
But this assumes only the two us and ns exist.
So basically I am asking for ways to work with these kinds of problems. How do I set/change the underlying representation?
This is certainly related to #63270 (just from a few hours ago)
I thought I put it here anyway as this causes many date tests to fail and I fear it is not only me.
Expected Behavior
Clearer reasons when/why precision changes.
Installed Versions
doesnt matter