Describe the bug
When importing a Substrait physical plan containing ReadRel.LocalFiles, DataFusion builds a Parquet scan against ObjectStoreUrl::local_filesystem() and copies the plan-provided UriPath / UriPathGlob / UriFile / UriFolder value into ObjectMeta.location without a host-supplied root or object-store policy check.
In embeddings that accept Substrait physical plans from lower-trust callers, this can allow the imported plan to select process-local Parquet files outside the host's intended dataset roots.
Relevant code path on current main:
datafusion/substrait/src/physical_plan/consumer.rs: FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), ...)
datafusion/substrait/src/physical_plan/consumer.rs: cloned Substrait path becomes ObjectMeta { location: path.into(), ... }
- the configured scan is returned as
DataSourceExec::from_data_source(...)
To Reproduce
- Import a Substrait physical plan using
ReadRel.LocalFiles for a Parquet read.
- Set the file path in the serialized plan to a local path selected by the plan submitter.
- Execute the returned physical plan in a host process that accepts the imported plan.
I am intentionally not including a full payload in the public issue. The static source path above is enough to identify the behavior.
Expected behavior
Imported physical plans should not be able to directly choose arbitrary process-local filesystem paths unless the embedding host explicitly supplies that policy. Possible fixes include rejecting absolute/traversing paths, requiring an allowlisted root or object-store binding during physical plan import, or resolving imported file references through registered catalog/object-store policy rather than hard-coding the local filesystem.
Additional context
This came from a local security review of Apache DataFusion at revision 38269f9c0cf1a80897aee588ea2daebe0aba4f6b. The impact depends on an embedding host exposing Substrait physical plan import across a trust boundary; DataFusion itself does not ship a standalone server/auth boundary in this repository.
Describe the bug
When importing a Substrait physical plan containing
ReadRel.LocalFiles, DataFusion builds a Parquet scan againstObjectStoreUrl::local_filesystem()and copies the plan-providedUriPath/UriPathGlob/UriFile/UriFoldervalue intoObjectMeta.locationwithout a host-supplied root or object-store policy check.In embeddings that accept Substrait physical plans from lower-trust callers, this can allow the imported plan to select process-local Parquet files outside the host's intended dataset roots.
Relevant code path on current
main:datafusion/substrait/src/physical_plan/consumer.rs:FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), ...)datafusion/substrait/src/physical_plan/consumer.rs: cloned Substrait path becomesObjectMeta { location: path.into(), ... }DataSourceExec::from_data_source(...)To Reproduce
ReadRel.LocalFilesfor a Parquet read.I am intentionally not including a full payload in the public issue. The static source path above is enough to identify the behavior.
Expected behavior
Imported physical plans should not be able to directly choose arbitrary process-local filesystem paths unless the embedding host explicitly supplies that policy. Possible fixes include rejecting absolute/traversing paths, requiring an allowlisted root or object-store binding during physical plan import, or resolving imported file references through registered catalog/object-store policy rather than hard-coding the local filesystem.
Additional context
This came from a local security review of Apache DataFusion at revision
38269f9c0cf1a80897aee588ea2daebe0aba4f6b. The impact depends on an embedding host exposing Substrait physical plan import across a trust boundary; DataFusion itself does not ship a standalone server/auth boundary in this repository.