Skip to content

Move provenance RDF/vocabulary to the consumer; keep ProcessingRecord + ProvenanceStore as the TS seam #473

Description

@ddeboer

Problem

@lde/pipeline coins LDE vocabulary in consumer-facing output. FileLoadedSparqlProvenanceStore serialises each ProcessingRecord to RDF under https://w3id.org/lde/provenance# (sourceFingerprint, pipelineVersion, status), which then lands in the consumer's (e.g. DKG's) provenance graph.

Per the distribution-health design decision (netwerk-digitaal-erfgoed/dataset-register#2103, RDF emission & vocabulary boundary), LDE should be a behind-the-scenes implementation detail and coin no vocabulary that appears in consumer output. The consumer (NDE) owns the output vocabulary — standard W3C vocabularies plus def.nde.nl. The w3id.org/lde/provenance# terms are the one remaining leak.

The seam already exists

The pipeline core does not emit self-coined RDF for provenance:

  • ProcessingRecord is pure TypeScript (sourceFingerprint, pipelineVersion, generatedAt, status).
  • ProvenanceStore is a TS-in/TS-out interface (get(uri) → ProcessingRecord, set(uri, record)) — “free to back this with a triplestore, files, or anything else.”
  • The w3id.org/lde/provenance# vocabulary lives in exactly one concrete class: FileLoadedSparqlProvenanceStore.

Proposal

Move the RDF serialisation (and its vocabulary choice) to the consumer, keeping ProcessingRecord + ProvenanceStore as the boundary:

  • The consumer (DKG) owns turning a ProcessingRecord into RDF under def.nde.nl, either by providing its own ProvenanceStore implementation, or by the store delegating record↔quads to a consumer-supplied serialiser while the reusable file/SPARQL get/set mechanics stay in @lde/pipeline.
  • Do not inject predicate IRIs into the existing serialiser — that still couples the consumer to LDE's predicate choices and PROV-O shape. The boundary is the TypeScript record, so the consumer owns the entire shape.

Result: @lde/pipeline coins zero output vocabulary; DKG's provenance graph is def.nde.nl (or whatever the consumer chooses). Standard W3C vocabularies (PROV-O, DQV, schema.org) in output are fine — the concern is only LDE-coined w3id.org/lde identifiers.

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions