Feat:[Draft][22715] add dictionaries as a supported group column type#23187
Feat:[Draft][22715] add dictionaries as a supported group column type#23187Rich-T-kid wants to merge 14 commits into
Conversation
2b60132 to
1ffc7f0
Compare
3f7ff57 to
e6b6dce
Compare
|
@kumarUjjawal could you run the dictionary benchmarks on this PR? Thx |
| } | ||
| } | ||
| DataType::Dictionary(key_dt, value_dt) => { | ||
| let new_field = Field::new("", *value_dt.clone(), true); |
There was a problem hiding this comment.
Since this field is never read again it may be fine to ignore the name field.
should be weary of similar issues to #21765 (comment)
|
@kumarUjjawal wanted to bump this 😄 |
|
run benchmark dictionary_group_values |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing rich-T-kid/dictionary-groupValuesColumn-impl (eb41915) to 01bf68c (merge-base) diff using: dictionary_group_values File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagedictionary_group_values — base (merge-base)
dictionary_group_values — branch
File an issue against this benchmark runner |
eb41915 to
770abfe
Compare
770abfe to
243a557
Compare
|
@codex review |
@Rich-T-kid Thank you! I have been sick so I won't be available for review. I will probably get back next week. |
@kumarUjjawal Sorry to hear that. I hope you feel better! no rush on the review! |
4ee52da to
7af7080
Compare
Which issue does this PR close?
Rationale for this change
This PR introduces a specialized
GroupColumnimplementation for dictionary-typed columns insideGroupValuesColumn, allowing dictionary columns to participate in the columnar, vectorized aggregation path instead of the row-based fallback.The Implementation is only about 175+ lines of code. the remaining LOC is adding extensive test at the
GroupColumntrait level as well as testing theGroupValuesColumnGroupValues trait and how it inter-opts with multi-dictionary group by's.What changes are included in this PR?
DictionaryGroupValueBuilderstruct implementing theGroupColumntrait forDictionary-typed group-by columns, supporting a configurable subset of value typesGroupValuesColumn::try_new(thematches!block) to acceptDictionary(_, value_type)wherevalue_typeis already supported.emitAre these changes tested?
yes. a majority of this PR is test
Are there any user-facing changes?
no. this is a pure perf boost for users.