HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE #6239

Aggarwal-Raghav · 2025-12-17T08:46:31Z

What changes were proposed in this pull request?

Check HIVE-29375 for repro and stacktrace

Why are the changes needed?

To make DATE type support for full outer join, which is, getting converted to map join because of hive.optimize.dynamic.partition.hashjoin=true;

Does this PR introduce any user-facing change?

No

How was this patch tested?

Wrote a new q file and will see CI output

mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_full_outer_join_date.q -Drat.skip -Dtest.output.overwrite -Pitests -pl itests/qtest

… type DATE

Aggarwal-Raghav · 2025-12-17T08:54:19Z

Explanation:

In Vectorizer.java the hashTableKeyTypeis getting set as DATE for DATE column type

hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java

Line 3455 in 5410b73

hashTableKeyType = HashTableKeyType.DATE;

But the DATE type is not present VectorMapJoinOuterGenerateResultOperator in :

hive/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/VectorMapJoinOuterGenerateResultOperator.java

Line 821 in c237510

switch (hashTableKeyType) {

For Double and Timestamp Columns, they are working without the patch as well because the default hashTableKeyType is MULTI_KEY

hive/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java

Line 3429 in 5410b73

hashTableKeyType = HashTableKeyType.MULTI_KEY;

Aggarwal-Raghav · 2025-12-17T16:27:12Z

Will address sonar issues post review comments. I',m willing to move the if-else pattern + old switch style with jdk21 'switch expressions'. Reviewers can let me know. will file separate jira to migrate especially in vectorization.

Aggarwal-Raghav · 2025-12-19T07:08:57Z

CC @zabetak , can you please help with the review?

zabetak · 2025-12-19T09:38:37Z

Hey @Aggarwal-Raghav, I am on holidays till Dec 29, with intermittent and not stable internet connection. Not sure if I will find time to check this before then.

Aggarwal-Raghav · 2025-12-19T09:41:47Z

Hey @Aggarwal-Raghav, I am on holidays till Dec 29, with intermittent and not stable internet connection. Not sure if I will find time to check this before then.

No worries. Enjoy !! 😅

mdayakar · 2025-12-19T13:44:57Z

Hi @Aggarwal-Raghav ,
I am not a vectorization feature expert but as per code changes I feel you missed below places adding DATE HashTableKeyType, please check.

Also I could see there are many UT test cases, may be you can add more test cases related to DATE HashTableKeyType. For example MapJoinTestConfig.java

Aggarwal-Raghav · 2025-12-20T17:53:00Z

Hi @Aggarwal-Raghav , I am not a vectorization feature expert but as per code changes I feel you missed below places adding DATE HashTableKeyType, please check.

CheckFastRowHashMap.java

VectorMapJoinOptimizedLongHashMap.java

VectorMapJoinOptimizedLongHashMap.java

Also I could see there are many UT test cases, may be you can add more test cases related to DATE HashTableKeyType. For example MapJoinTestConfig.java

Thanks for the thorough review @mdayakar , i checked the above places you mentioned. For [2] and [3], it is already handled in this PR. For [1] and MapJoinTestConfig.java which are test classes, I'll evaluate on adding more test cases for DATE type and update in sometime.

sonarqubecloud · 2025-12-22T11:01:50Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zabetak

Changes LGTM! I left some comments for minor improvements and potentially few unit tests more.

(The Sonar issues are not worth fixing; at least not now)

zabetak · 2026-01-02T18:20:04Z

ql/src/test/queries/clientpositive/vector_full_outer_join_date.q

+-- Test timestamp column
+create table tbl3 (id int, event_date timestamp);
+create table tbl4 (id int, event_date timestamp);
+
+insert into tbl3 values (1, '2025-12-17 10:20:30'), (2, '2025-12-17 11:20:30');
+insert into tbl4 values (2, '2025-12-17 11:20:30'), (3, '2025-12-17 09:20:30');
+
+select tbl3.id, tbl3.event_date from tbl3 full outer join tbl4 on tbl3.event_date = tbl4.event_date order by tbl3.id;
+
+-- Test Double column
+create table tbl5 (id int, val double);
+create table tbl6 (id int, val double);
+
+insert into tbl5 values (1, 5.6D), (2, 3.2D);
+insert into tbl6 values (2, 3.2D), (3, 7.2D);
+
+select tbl5.id, tbl5.val from tbl5 full outer join tbl6 on tbl5.val = tbl6.val order by tbl5.id;


Why are we adding tests for TIMESTAMP and DOUBLE types? They don't seem to be in the same code path with DATE. Are we fixing anything with respect to those data types?

Not fixing anything for DOUBLE and TIMESTAMP types, just added as they were no covered in vector_full_outer_join.q or vector_full_outer_join2.q. Will remove it.

zabetak · 2026-01-02T18:21:36Z

ql/src/test/queries/clientpositive/vector_full_outer_join_date.q

+insert into tbl1 values (1, '2023-01-01'), (2, '2023-01-02'), (3, '2023-01-03');
+insert into tbl2 values (2, '2023-01-02'), (3, '2023-01-04'), (4, '2023-01-05');
+
+select tbl1.id, tbl1.event_date from tbl1 full outer join tbl2 on tbl1.event_date = tbl2.event_date order by tbl1.id;


Since we are performing a join it would be nice to SELECT also columns from tbl2 otherwise we can't tell if the result is correct.

will use select *

zabetak · 2026-01-02T18:33:17Z

...g/apache/hadoop/hive/ql/exec/vector/mapjoin/optimized/VectorMapJoinOptimizedLongHashMap.java

        if (!keyBinarySortableDeserializeRead.readNextField()) {
          return false;
        }
        switch (hashMap.hashTableKeyType) {


The code here seems very similar to org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashUtil#deserializeLongKey. Should we use this method instead?

Ack, it's doable as well. IMO it's better to move class from

package org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast => package org.apache.hadoop.hive.ql.exec.vector.mapjoin

and rename it from VectorMapJoinFastLongHashUtil => VectorMapJoinLongHashUtil

As there is clear segregation between fast and optimized flow , importing org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastLongHashUtil#deserializeLongKey in org.apache.hadoop.hive.ql.exec.vector.mapjoin.optimized.VectorMapJoinOptimizedLongHashMap can make it confusing.

Let me know on this.

zabetak · 2026-01-02T18:45:41Z

ql/src/test/queries/clientpositive/vector_full_outer_join_date.q

+insert into tbl1 values (1, '2023-01-01'), (2, '2023-01-02'), (3, '2023-01-03');
+insert into tbl2 values (2, '2023-01-02'), (3, '2023-01-04'), (4, '2023-01-05');
+
+select tbl1.id, tbl1.event_date from tbl1 full outer join tbl2 on tbl1.event_date = tbl2.event_date order by tbl1.id;


Can we also print the plan using explain vectorization detail in order to ensure that we are indeed using the expected vectorized operator.

zabetak · 2026-01-02T18:51:11Z

...test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java

+        random,
+        VectorRandomRowSource.SupportedTypes.ALL,
+        4,
+        /* allowNulls */ false, /* isUnicodeOk */


nit: Drop redundant comments

zabetak · 2026-01-02T18:51:22Z

...test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/TestVectorMapJoinFastRowHashMap.java

+        HashTableKeyType.DATE,
+        verifyTable,
+        new String[] {"date"},
+        /* doClipping */ false, /* useExactBytes */


nit: Drop redundant comments

zabetak · 2026-01-02T19:01:57Z

ql/src/test/org/apache/hadoop/hive/ql/exec/vector/mapjoin/MapJoinTestConfig.java

+      case DATE:
+        hashTableKeyType = HashTableKeyType.DATE;
+        break;


Do we have unit tests exploiting this config? Do we need to add something in TestMapJoinOperator?

Will add a testcase testDate0 in TestMapJoinOperator. As testString0 makes use of the DATE type, but it does so as a Value column, not as a Join Key.

Aggarwal-Raghav · 2026-01-03T15:17:43Z

Thanks for the review @zabetak , I'll accomodate the suggestions.

Aggarwal-Raghav · 2026-01-04T18:26:50Z

Will push changes based on your input on #6239 (comment)

HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key…

90033cf

… type DATE

asf-ci-hive added the tests pending label Dec 17, 2025

asf-ci-hive added tests passed and removed tests pending labels Dec 17, 2025

Add DATE support in test code as well

b2674e5

asf-ci-hive added tests pending and removed tests passed labels Dec 22, 2025

asf-ci-hive added tests passed and removed tests pending labels Dec 22, 2025

zabetak approved these changes Jan 2, 2026

View reviewed changes

HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE #6239

Are you sure you want to change the base?

HIVE-29375: FULL OUTER JOIN is failing with Unexpected hash table key type DATE #6239

Uh oh!

Conversation

Aggarwal-Raghav commented Dec 17, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Aggarwal-Raghav commented Dec 17, 2025

Uh oh!

Aggarwal-Raghav commented Dec 17, 2025

Uh oh!

Aggarwal-Raghav commented Dec 19, 2025

Uh oh!

zabetak commented Dec 19, 2025

Uh oh!

Aggarwal-Raghav commented Dec 19, 2025

Uh oh!

mdayakar commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Aggarwal-Raghav commented Dec 20, 2025

Uh oh!

sonarqubecloud bot commented Dec 22, 2025

Quality Gate passed

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aggarwal-Raghav commented Jan 3, 2026

Uh oh!

Aggarwal-Raghav commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mdayakar commented Dec 19, 2025 •

edited

Loading

Aggarwal-Raghav commented Jan 4, 2026 •

edited

Loading