[Epic] Add MERGE INTO support for DataFusion integration

### What's the feature are you trying to implement?

Add support for SQL `MERGE INTO` (UPSERT) operations in the iceberg-datafusion integration. This enables atomic row-level updates and inserts based on join conditions, essential for CDC pipelines, incremental updates, and data synchronization. I already have a [PoC branch](https://github.com/wirybeaver/iceberg-rust/commits/feature/merge-into/).

Refer to the [insert_into support](https://github.com/apache/iceberg-rust/issues/1540)

The Spark **SPJ** (Storage Partition Join) style is the key optimization I wanted to introduce. The Datafusion currently doesn't support `merge_into` sql parsing and logic plan yet. I am contributing the "MERGE INTO" in datafusion as well: https://github.com/apache/datafusion/issues/20746.

**SQL Example:**
```sql
MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN
  UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN
  INSERT (id, value) VALUES (s.id, s.value)
```

---

### Query Execution Plan (CoW Mode)

#### Baseline Plan (Unpartitioned Table)

```
┌─────────────────────────────┐
│   IcebergMergeCommitExec    │  Commits via RowDelta transaction
│   (add + remove data files) │  Outputs: record count
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│   CoalescePartitionsExec    │  Merges all partitions into
│                             │  single stream for atomic commit
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│   IcebergMergeWriteExec     │  Writes merged rows to new Parquet
│                             │  files via TaskWriter; tracks
│                             │  _file values as deleted_files
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│      IcebergMergeExec       │  FULL OUTER JOIN (HashJoinExec)
│                             │  Classifies rows:
│                             │    MATCHED → apply UPDATE exprs
│                             │    NOT MATCHED → apply INSERT exprs
└──────┬──────────────┬───────┘
       │              │
┌──────▼──────┐ ┌─────▼───────┐
│ IcebergTable│ │  Source Plan │
│    Scan     │ │  (any exec) │
│ (target,    │ │             │
│  with _file)│ │             │
└─────────────┘ └─────────────┘
```

#### SPJ-Optimized Plan (Partitioned Table)

When **all** partition columns appear in the join keys and use hash-compatible
transforms (Identity or Bucket), the optimizer wraps both sides with
repartitioning to eliminate cross-partition shuffles:

```
┌─────────────────────────────┐
│   IcebergMergeCommitExec    │
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│   CoalescePartitionsExec    │
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│   IcebergMergeWriteExec     │
└──────────────┬──────────────┘
               │
┌──────────────▼──────────────┐
│      IcebergMergeExec       │
└──────┬──────────────┬───────┘
       │              │
┌──────▼──────┐ ┌─────▼───────┐
│Repartition  │ │Repartition  │
│Exec (Hash   │ │Exec (Hash   │
│on _partition)│ │on _partition)│
└──────┬──────┘ └──────┬──────┘
       │               │
┌──────▼──────┐ ┌──────▼──────┐
│Projection   │ │Projection   │
│Exec (adds   │ │Exec (adds   │
│_partition)  │ │_partition)  │
└──────┬──────┘ └──────┬──────┘
       │               │
┌──────▼──────┐ ┌──────▼──────┐
│ IcebergTable│ │ Source Plan  │
│   Scan      │ │ (any exec)  │
│(target,     │ │             │
│ with _file) │ │             │
└─────────────┘ └─────────────┘
```

---

The following tasks are already completed on the PoC branch. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.
- [x] https://github.com/apache/iceberg-rust/pull/2203
- [ ] Add IcebergMergeExec with HashJoinExec integration and row classification
- [ ] Add IcebergMergeWriteExec and IcebergMergeCommitExec nodes
- [ ] Implement full MERGE execution logic with file tracking
- [ ] Integrate MERGE INTO into IcebergTableProvider
- [ ] Add comprehensive MERGE INTO integration tests
- [ ] Add partition-aware merge optimization (spark storage partition join style)

### Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Epic] Add MERGE INTO support for DataFusion integration #2201

What's the feature are you trying to implement?

Query Execution Plan (CoW Mode)

Baseline Plan (Unpartitioned Table)

SPJ-Optimized Plan (Partitioned Table)

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Epic] Add MERGE INTO support for DataFusion integration #2201

Description

What's the feature are you trying to implement?

Query Execution Plan (CoW Mode)

Baseline Plan (Unpartitioned Table)

SPJ-Optimized Plan (Partitioned Table)

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions