Is your feature request related to a problem or challenge?
Distribution::HashPartitioned is documented as requiring rows with equal key values to land in the same partition. While working through range partitioning, @2010YOUY01 and @gabotechs pointed out this is really a key-partitioning contract, not a requirement that the existing input is specifically hash partitioned.
This name has historically caused confusion / misuse, and as range partitioning support expands this continues to come up. The key point is that range partitioning can satisfy some single-input key partitioning requirements without specifically being hash partitioned.
Describe the solution you'd like
Clarify the API direction for this distribution requirement. Options discussed include:
- keep
HashPartitioned but document it as historical naming for key partitioning (I am not a fan of this one)
- migrate to a
KeyPartitioned name and have both Partitioning::Hash and compatible Partitioning::Range satisfy this
This issue is only about the per-input distribution requirement. Multi-input / join co-partitioning should be handled separately.
NOTE: I would prefer this rename / replacement to happen after aggregations (a unary operator) and joins (a multi-input operator) HashPartitioned distributions are satisfied via range partitioning before replacing the public HashPartitioned variant. I want to take this approach to ensure we have worked out a majority of the kinks and nuances of this replacement before making large public API changes.
Additional context
Epic: #22395
Related PRs / discussion:
Is your feature request related to a problem or challenge?
Distribution::HashPartitionedis documented as requiring rows with equal key values to land in the same partition. While working through range partitioning, @2010YOUY01 and @gabotechs pointed out this is really a key-partitioning contract, not a requirement that the existing input is specifically hash partitioned.This name has historically caused confusion / misuse, and as range partitioning support expands this continues to come up. The key point is that range partitioning can satisfy some single-input key partitioning requirements without specifically being hash partitioned.
Describe the solution you'd like
Clarify the API direction for this distribution requirement. Options discussed include:
HashPartitionedbut document it as historical naming for key partitioning (I am not a fan of this one)KeyPartitionedname and have bothPartitioning::Hashand compatiblePartitioning::Rangesatisfy thisThis issue is only about the per-input distribution requirement. Multi-input / join co-partitioning should be handled separately.
NOTE: I would prefer this rename / replacement to happen after aggregations (a unary operator) and joins (a multi-input operator)
HashPartitioneddistributions are satisfied via range partitioning before replacing the publicHashPartitionedvariant. I want to take this approach to ensure we have worked out a majority of the kinks and nuances of this replacement before making large public API changes.Additional context
Epic: #22395
Related PRs / discussion: