Migrate from SQLAlchemy to Ibis by daniel-thom · Pull Request #62 · NatLabRockies/chronify

daniel-thom · 2026-01-19T16:37:35Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.
Support of PyHive (the SQLAlchemy driver for Hive) is unclear.
dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.
We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)
This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.
Talk to other chronify users about dropping transaction support.

codecov-commenter · 2026-01-20T01:11:15Z

Codecov Report

❌ Patch coverage is 87.53213% with 194 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.91%. Comparing base (b5d1cdc) to head (760bbdc).

Files with missing lines	Patch %	Lines
src/chronify/ibis/spark_backend.py	73.01%	34 Missing ⚠️
src/chronify/ibis/functions.py	77.61%	30 Missing ⚠️
src/chronify/ibis/types.py	51.92%	25 Missing ⚠️
src/chronify/store.py	88.00%	21 Missing ⚠️
src/chronify/time_series_mapper_base.py	86.61%	19 Missing ⚠️
src/chronify/ibis/duckdb_backend.py	79.74%	16 Missing ⚠️
src/chronify/ibis/sqlite_backend.py	80.72%	16 Missing ⚠️
src/chronify/ibis/base.py	80.00%	14 Missing ⚠️
tests/conftest.py	87.80%	5 Missing ⚠️
tests/test_spark_backend.py	97.91%	4 Missing ⚠️
... and 6 more

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #62       +/-   ##
===========================================
+ Coverage   52.41%   91.91%   +39.49%     
===========================================
  Files          58       54        -4     
  Lines       12936     4837     -8099     
===========================================
- Hits         6781     4446     -2335     
+ Misses       6155      391     -5764

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lixiliu · 2026-01-24T20:51:13Z

This is a prototype, mostly generated by Claude. The goal is to see if we can simplify interaction with dsgrid when running Spark jobs. The current code based on SQLAlchemy requires a separate Spark session: dsgrid creates a session with pyspark and chronify relies on an Apache Thrift Server (Hive).

We would get the following benefits by migrating:

Pass an Ibis Table object from dsgrid to chronify to perform time validation instead of only a path to a Parquet file.

Support of PyHive (the SQLAlchemy driver for Hive) is unclear.

dsgrid could drop much of its special handling for Spark vs DuckDB (DuckDB has an experimental Spark API, but it is incomplete and has an uncertain future). Ibis appears to be a better long term solution.

We would lose this functionality in SQLAlchemy:

Database transactions with rollback. Ibis does not support this natively.

We currently allow the user to ingest rows from multiple DataFrames into an existing table. If the first DataFrame is valid but the second is not, we perform a rollback and the state of the database is the same as the original state. With Ibis, we do not have code to delete the added rows. (It could be done with special-casing for backends that support it.)

This is not import for dsgrid as we do not ingest data like this. We need to ask other chronify users.

Outstanding work:

Some tests are failing due to time zone / DST handling with Spark.

Talk to other chronify users about dropping transaction support.

For rollback behavior, can't we just make a copy of the original dataframe to do the next operation so we have something to fall back on?

lixiliu

I am fine with switching to Ibis for its ability to avoid the ways we have to handle different backend now. There's less changes to the code structure that I expected.

I think Claude is overcomplicating the mapping logic a bit and could use some more iteration there.

lixiliu · 2026-01-24T21:01:10Z

src/chronify/time_series_mapper_base.py

        """Convert time columns with from_schema to to_schema configuration."""


+def _ensure_mapping_types_match_source(


good call, but it is redundant with the data type handling in the mapping process later.

lixiliu · 2026-01-24T21:02:52Z

src/chronify/time_series_mapper_base.py

+    df_mapping = _ensure_mapping_types_match_source(df_mapping, from_schema, backend)
+
+    # Debug: Print mapping DF around target time
+    try:


What is this??

lixiliu · 2026-01-24T21:09:54Z

src/chronify/time_series_mapper_base.py

+    if left_type is None or right_type is None:
+        return left_col == right_col
+
+    left_is_unknown = not hasattr(left_type, "is_timestamp") or str(left_type).startswith(


Feels incomplete and inconsistent.

Generally we only want to change the right_table key because left_table is the input data, right_table is the mapping table. IMO it's safer to have the right key conform to the left key only.

lixiliu · 2026-01-24T21:35:53Z

src/chronify/time_series_mapper_base.py

-    if resampling_operation:
-        query = query.group_by(*groupby_stmt)
+    predicates = _build_join_predicates(left_table, right_table, keys)
+    joined = left_table.join(right_table, predicates)


These join operations are clean, but the rest seems unnecessarily complicated.

I think Claude tried to preserve the existing way of handling potential column conflicts, which is to split the final columns into left_columns, right_columns, and joined_columns, but it also wrote additional code to handle potential conflicts all over again. That's why _build_select_columns and _build_query take in so many input variables and so complicated.

lixiliu and others added 13 commits December 23, 2025 16:41

Fix bug in iter_timestamps + refactor to IndexTimeRange

a33a0aa

Initial commit of localizer

f74029f

commit for now

1ca60ed

fixed!

9052d2c

fix datetime_generator._iter_timestamps()

217a91f

update pytest

05c1c20

fix local time test

f460891

update tests

3fd7615

update store with localization funcs

4f2a3f3

Replace sqlalchemy with ibis

a28ef3c

Fix Spark time zone handling

90d955c

Refactor

314e6f7

Refactor

9d47b9a

mypy fixes

cf105fa

lixiliu reviewed Jan 24, 2026

View reviewed changes

Base automatically changed from ll/local_time2 to main February 8, 2026 20:36

daniel-thom added 2 commits February 8, 2026 15:01

Merge branch 'main' into feat/ibis

2139806

Fix errors after merge

760bbdc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate from SQLAlchemy to Ibis#62

Migrate from SQLAlchemy to Ibis#62
daniel-thom wants to merge 16 commits intomainfrom
feat/ibis

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 •

edited

Loading

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

lixiliu left a comment

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

lixiliu Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"""Convert time columns with from_schema to to_schema configuration."""


		def _ensure_mapping_types_match_source(

Conversation

daniel-thom commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lixiliu commented Jan 24, 2026

Uh oh!

lixiliu left a comment

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

lixiliu Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jan 20, 2026 •

edited

Loading