Describe the bug
USING / NATURAL join column is COALESCE(left.k, right.k). For a right-only row of a RIGHT/FULL outer join the left key is NULL-padded, so the merged key must take the right key's value. DataFusion instead resolves the unqualified key to the left column only, so it comes out NULL for right-only rows.
This is silent (the query plans and runs fine) and it corrupts downstream WHERE and GROUP BY on the key. INNER / LEFT JOIN USING are unaffected — the defect only appears when the left side is NULL-padded (RIGHT/FULL).
To Reproduce
create table a(k int, x int) as values (1, 10);
create table b(k int, y int) as values (4, 400);
select k from a full join b using (k) order by k nulls last; -- actual: 1, NULL expected: 1, 4
select k from a right join b using (k); -- actual: NULL expected: 4
select k from a natural full join b order by k nulls last; -- actual: 1, NULL expected: 1, 4
-- the wrong NULL key propagates downstream:
select * from a full join b using (k) where k = 4; -- actual: 0 rows expected: 1 row
select k, count(*) from a full join b using (k) group by k; -- actual: k=NULL expected: k=4
-- the explicit form is correct, confirming the divergence:
select coalesce(a.k, b.k) as k from a full join b on a.k = b.k order by k nulls last; -- 1, 4
Expected behavior
The merged USING/NATURAL key should equal COALESCE(left.k, right.k) for all join types, so right-only rows of a RIGHT/FULL join expose the right key value — matching the explicit coalesce(a.k, b.k) ... ON a.k = b.k form, PostgreSQL, and the SQL standard.
Additional context
The logical plan binds the merged key to the left column with no COALESCE:
> explain format indent select k from a full join b using (k);
Projection: a.k <-- should be coalesce(a.k, b.k)
Full Join: Using a.k = b.k
The USING/NATURAL constraint is lowered in datafusion/sql/src/relation/join.rs; the merged/unqualified key resolves to the left key during column/wildcard resolution rather than to COALESCE(left.k, right.k). There is currently no sqllogictest coverage for RIGHT/FULL USING/NATURAL, which is why it went unnoticed.
Describe the bug
USING/NATURALjoin column isCOALESCE(left.k, right.k). For a right-only row of aRIGHT/FULLouter join the left key is NULL-padded, so the merged key must take the right key's value. DataFusion instead resolves the unqualified key to the left column only, so it comes outNULLfor right-only rows.This is silent (the query plans and runs fine) and it corrupts downstream
WHEREandGROUP BYon the key.INNER/LEFT JOIN USINGare unaffected — the defect only appears when the left side is NULL-padded (RIGHT/FULL).To Reproduce
Expected behavior
The merged
USING/NATURALkey should equalCOALESCE(left.k, right.k)for all join types, so right-only rows of aRIGHT/FULLjoin expose the right key value — matching the explicitcoalesce(a.k, b.k) ... ON a.k = b.kform, PostgreSQL, and the SQL standard.Additional context
The logical plan binds the merged key to the left column with no
COALESCE:The
USING/NATURALconstraint is lowered indatafusion/sql/src/relation/join.rs; the merged/unqualified key resolves to the left key during column/wildcard resolution rather than toCOALESCE(left.k, right.k). There is currently no sqllogictest coverage forRIGHT/FULLUSING/NATURAL, which is why it went unnoticed.