feat(rest): add scan plan endpoint support to REST catalog client#783
Conversation
When a table is loaded from a REST catalog that advertises the PlanTableScan
endpoint, NewScan() now returns a RestTableScanBuilder whose Build() produces
a RestTableScan. PlanFiles() on that scan delegates manifest resolution to
the server via POST /plan, GET /plan/{id} (with exponential backoff),
POST /tasks/{id}, and DELETE /plan/{id} (best-effort cancel), instead of
reading manifests locally.
- Add RestTable, RestTableScanBuilder, RestTableScan and RestScanContext
- Promote DataTableScan::PlanFiles and TableScanBuilder::Build to virtual
- Convert RestCatalog::client_ and paths_ to shared_ptr so RestScanContext
can share ownership with live scans
| auto table_catalog = std::make_shared<TableScopedCatalog>( | ||
| shared_from_this(), context, identifier, table_config, table_session); | ||
|
|
||
| if (supported_endpoints_.contains(Endpoint::PlanTableScan())) { |
There was a problem hiding this comment.
This should also gate on the effective scan-planning-mode, not only endpoint support. Java defaults to client-side planning and lets the table config override the client config, so a table can otherwise be forced into REST planning even when the server says client, or silently fall back when the server says server but the endpoint is missing.
| request.case_sensitive = context_.case_sensitive; | ||
| request.min_rows_requested = context_.min_rows_requested; | ||
|
|
||
| if (context_.from_snapshot_id.has_value() && context_.to_snapshot_id.has_value()) { |
There was a problem hiding this comment.
We need to set use-snapshot-schema for snapshot/time-travel and incremental scans. Java sends it for useSnapshot and start/end snapshot scans, and the REST spec says time travel should use the snapshot schema. Without it, schema-evolved tables can be planned against the current schema.
| rest_context_.client->Post(path, json_request, /*headers=*/{}, | ||
| *PlanErrorHandler::Instance(), *rest_context_.session)); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto json, FromJsonString(response.body())); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto result, |
There was a problem hiding this comment.
Planning responses can include storage-credentials. Java switches to a scan-scoped FileIO built from those credentials, and the spec expects clients to use them for the returned tasks. Ignoring them means servers that vend temporary storage credentials can plan successfully but reads may fail.
|
|
||
| switch (result.plan_status) { | ||
| case PlanStatus::kCompleted: | ||
| return ResolveScanTasks(result.plan_tasks, result.file_scan_tasks, specs); |
There was a problem hiding this comment.
Once a plan-id is returned, the server may hold resources until all plan tasks are fetched or the plan is cancelled. If resolving paginated tasks fails partway through, this returns without cancelling the remaining plan; Java cancels from the scan-task iterable cleanup path.
When a table is loaded from a REST catalog that advertises the PlanTableScan endpoint, NewScan() now returns a RestTableScanBuilder whose Build() produces a RestTableScan. PlanFiles() on that scan delegates manifest resolution to the server via POST /plan, GET /plan/{id} (with exponential backoff), POST /tasks/{id}, and DELETE /plan/{id} (best-effort cancel), instead of reading manifests locally.