Skip to content

Comments

Fix transform graph#1077

Merged
melonora merged 7 commits intoscverse:mainfrom
melonora:fix_transform_graph
Feb 21, 2026
Merged

Fix transform graph#1077
melonora merged 7 commits intoscverse:mainfrom
melonora:fix_transform_graph

Conversation

@melonora
Copy link
Collaborator

Closes #1064
Closes scverse/napari-spatialdata#388

This PR fixes the problem of transforming points data where the dask dataframe has multiple partitions. This particular problem occured when writing a multipartition dataframe to parquet, reading it back in with SpatialData and then transforming the data.
The problem occured due to a partition collaps when dask expr (dask >2025.0.0) optimizes the dask graph. This specific problem is prevented in transforms by explicitly passing the current partition lengths. However, this leads to a missing dependency problem in the dask graph that is prevented by setting the dask optimization to False when performing specific operations.

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.95%. Comparing base (bc5b2ca) to head (abf3a40).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1077      +/-   ##
==========================================
+ Coverage   91.94%   91.95%   +0.01%     
==========================================
  Files          51       51              
  Lines        7673     7685      +12     
==========================================
+ Hits         7055     7067      +12     
  Misses        618      618              
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 100.00% <100.00%> (ø)
src/spatialdata/_core/operations/transform.py 91.48% <100.00%> (+0.11%) ⬆️
src/spatialdata/_utils.py 85.43% <100.00%> (+0.92%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# to collaps of the partitions. However this causes a missing dependency problem, which for now is prevented
# by setting the optimization to False when performing this operation.
arrays.append(data[ax].to_dask_array(lengths=[len(part) for part in data.partitions]).reshape(-1, 1))
config.set({"optimization.tune.active": True})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be restored to the old value, maybe it was already globally turned off. Anyway, good fix!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, not that I would expect many to know how to tune this, but will change it.

@melonora melonora force-pushed the fix_transform_graph branch from ee22026 to 7d92dd9 Compare February 20, 2026 12:32
This is a note explaining the workaround in case people run into the partition collaps problem due to dask graph optimization.
@melonora melonora force-pushed the fix_transform_graph branch 2 times, most recently from bbd1efc to e978fa1 Compare February 20, 2026 21:09
Reason for this is that this dask v2025.2.0 does not allow for disabling graph optimization, but neither keeps partition size consistent. Turning optimization off was introduced in dask 2025.12.0.
@melonora melonora force-pushed the fix_transform_graph branch from 9076459 to 66384c2 Compare February 20, 2026 22:30
@LucaMarconato
Copy link
Member

Looks great! Two small comments:

  • in the docs could you please add a link to the dask issue where this is discussed, since this will have to be resolved upstream as it is related to dask-expr, not spatialdata.
  • could you double check that after reading, operations like .compute() work without problem? I.e. the problem is only when reading or any downstream operation?

@melonora melonora merged commit 1383834 into scverse:main Feb 21, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants