Skip to content

RIGHT/FULL/NATURAL JOIN ... USING(k) does not coalesce the join key (returns NULL for right-only rows) #22881

Description

@nathanb9

Describe the bug

USING / NATURAL join column is COALESCE(left.k, right.k). For a right-only row of a RIGHT/FULL outer join the left key is NULL-padded, so the merged key must take the right key's value. DataFusion instead resolves the unqualified key to the left column only, so it comes out NULL for right-only rows.

This is silent (the query plans and runs fine) and it corrupts downstream WHERE and GROUP BY on the key. INNER / LEFT JOIN USING are unaffected — the defect only appears when the left side is NULL-padded (RIGHT/FULL).

To Reproduce

create table a(k int, x int) as values (1, 10);
create table b(k int, y int) as values (4, 400);

select k from a full join b using (k) order by k nulls last;   -- actual: 1, NULL   expected: 1, 4
select k from a right join b using (k);                        -- actual: NULL      expected: 4
select k from a natural full join b order by k nulls last;     -- actual: 1, NULL   expected: 1, 4

-- the wrong NULL key propagates downstream:
select * from a full join b using (k) where k = 4;             -- actual: 0 rows    expected: 1 row
select k, count(*) from a full join b using (k) group by k;    -- actual: k=NULL    expected: k=4

-- the explicit form is correct, confirming the divergence:
select coalesce(a.k, b.k) as k from a full join b on a.k = b.k order by k nulls last;  -- 1, 4

Expected behavior

The merged USING/NATURAL key should equal COALESCE(left.k, right.k) for all join types, so right-only rows of a RIGHT/FULL join expose the right key value — matching the explicit coalesce(a.k, b.k) ... ON a.k = b.k form, PostgreSQL, and the SQL standard.

Additional context

The logical plan binds the merged key to the left column with no COALESCE:

> explain format indent select k from a full join b using (k);

Projection: a.k                       <-- should be coalesce(a.k, b.k)
  Full Join: Using a.k = b.k

The USING/NATURAL constraint is lowered in datafusion/sql/src/relation/join.rs; the merged/unqualified key resolves to the left key during column/wildcard resolution rather than to COALESCE(left.k, right.k). There is currently no sqllogictest coverage for RIGHT/FULL USING/NATURAL, which is why it went unnoticed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions