Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1077 +/- ##
==========================================
+ Coverage 91.94% 91.95% +0.01%
==========================================
Files 51 51
Lines 7673 7685 +12
==========================================
+ Hits 7055 7067 +12
Misses 618 618
🚀 New features to boost your workflow:
|
| # to collaps of the partitions. However this causes a missing dependency problem, which for now is prevented | ||
| # by setting the optimization to False when performing this operation. | ||
| arrays.append(data[ax].to_dask_array(lengths=[len(part) for part in data.partitions]).reshape(-1, 1)) | ||
| config.set({"optimization.tune.active": True}) |
Member
There was a problem hiding this comment.
it should be restored to the old value, maybe it was already globally turned off. Anyway, good fix!
Collaborator
Author
There was a problem hiding this comment.
True, not that I would expect many to know how to tune this, but will change it.
ee22026 to
7d92dd9
Compare
This is a note explaining the workaround in case people run into the partition collaps problem due to dask graph optimization.
bbd1efc to
e978fa1
Compare
Reason for this is that this dask v2025.2.0 does not allow for disabling graph optimization, but neither keeps partition size consistent. Turning optimization off was introduced in dask 2025.12.0.
9076459 to
66384c2
Compare
Member
|
Looks great! Two small comments:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1064
Closes scverse/napari-spatialdata#388
This PR fixes the problem of transforming points data where the dask dataframe has multiple partitions. This particular problem occured when writing a multipartition dataframe to parquet, reading it back in with SpatialData and then transforming the data.
The problem occured due to a partition collaps when dask expr (dask >2025.0.0) optimizes the dask graph. This specific problem is prevented in transforms by explicitly passing the current partition lengths. However, this leads to a missing dependency problem in the dask graph that is prevented by setting the dask optimization to False when performing specific operations.