Fix metrics for repartition when preserve_order=true#20924
Fix metrics for repartition when preserve_order=true#20924xanderbailey wants to merge 2 commits into
preserve_order=true#20924Conversation
5a7328a to
a5ccd80
Compare
a5ccd80 to
01598d9
Compare
| spill_stream, | ||
| 1, // Each receiver handles one input partition | ||
| BaselineMetrics::new(&metrics, partition), | ||
| BaselineMetrics::new(&intermediate_metrics, partition), |
There was a problem hiding this comment.
if we are just going to ignore the metrics, should we just remove them from PerPartitionStream ?
It seems like using a local copy of ExecutionPlanMetrics means they metrics in the PerPartitionStream are no longer accessable. So we can probably just remove the metrics to make it clearer they aren't used
There was a problem hiding this comment.
Yeah good question, we still use PerPartitionStream here
There was a problem hiding this comment.
Any thoughts here on what you'd prefer @alamb?
There was a problem hiding this comment.
Making it optional made the code inside PerPartitionStream less clean I think but I don't have a strong opinion I have to say so I'm happy to do whatever
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
|
Not stale please don’t close |
|
I’m away for a week or so and I’ll resolve the conflict then but I’d love to get the one merged if we can |
alamb
left a comment
There was a problem hiding this comment.
Thanks @xanderbailey -- this looks good to me
|
|
||
| /// preserve_order repartition should not double-count | ||
| /// output rows. | ||
| #[tokio::test] |
There was a problem hiding this comment.
I verified this test covers the new code
cargo test -p datafusion-physical-plan -- --nocapture
This fails without the code change
thread 'repartition::test::test_preserve_order_output_rows_not_double_counted' panicked at datafusion/physical-plan/src/repartition/mod.rs:2991:9:
assertion `left == right` failed: metrics output_rows (8) should match actual rows collected (4), not double-count
left: 8
right: 4
Verification
preserve_order=true
Which issue does this PR close?
Found this when working on #20875
Rationale for this change
Metric reporting was previously incorrect for
RepartitionExecif preserve order was set to true.What changes are included in this PR?
Create new metrics before creating
PerPartitionStreamAre these changes tested?
Yes and confirmed that this fails on main:
Are there any user-facing changes?