Skip to content

Substrait physical plan import should not accept arbitrary local file paths #23171

Description

@ametel01

Describe the bug

When importing a Substrait physical plan containing ReadRel.LocalFiles, DataFusion builds a Parquet scan against ObjectStoreUrl::local_filesystem() and copies the plan-provided UriPath / UriPathGlob / UriFile / UriFolder value into ObjectMeta.location without a host-supplied root or object-store policy check.

In embeddings that accept Substrait physical plans from lower-trust callers, this can allow the imported plan to select process-local Parquet files outside the host's intended dataset roots.

Relevant code path on current main:

  • datafusion/substrait/src/physical_plan/consumer.rs: FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), ...)
  • datafusion/substrait/src/physical_plan/consumer.rs: cloned Substrait path becomes ObjectMeta { location: path.into(), ... }
  • the configured scan is returned as DataSourceExec::from_data_source(...)

To Reproduce

  1. Import a Substrait physical plan using ReadRel.LocalFiles for a Parquet read.
  2. Set the file path in the serialized plan to a local path selected by the plan submitter.
  3. Execute the returned physical plan in a host process that accepts the imported plan.

I am intentionally not including a full payload in the public issue. The static source path above is enough to identify the behavior.

Expected behavior

Imported physical plans should not be able to directly choose arbitrary process-local filesystem paths unless the embedding host explicitly supplies that policy. Possible fixes include rejecting absolute/traversing paths, requiring an allowlisted root or object-store binding during physical plan import, or resolving imported file references through registered catalog/object-store policy rather than hard-coding the local filesystem.

Additional context

This came from a local security review of Apache DataFusion at revision 38269f9c0cf1a80897aee588ea2daebe0aba4f6b. The impact depends on an embedding host exposing Substrait physical plan import across a trust boundary; DataFusion itself does not ship a standalone server/auth boundary in this repository.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions