Problem
@lde/pipeline coins LDE vocabulary in consumer-facing output. FileLoadedSparqlProvenanceStore serialises each ProcessingRecord to RDF under https://w3id.org/lde/provenance# (sourceFingerprint, pipelineVersion, status), which then lands in the consumer's (e.g. DKG's) provenance graph.
Per the distribution-health design decision (netwerk-digitaal-erfgoed/dataset-register#2103, RDF emission & vocabulary boundary), LDE should be a behind-the-scenes implementation detail and coin no vocabulary that appears in consumer output. The consumer (NDE) owns the output vocabulary — standard W3C vocabularies plus def.nde.nl. The w3id.org/lde/provenance# terms are the one remaining leak.
The seam already exists
The pipeline core does not emit self-coined RDF for provenance:
ProcessingRecord is pure TypeScript (sourceFingerprint, pipelineVersion, generatedAt, status).
ProvenanceStore is a TS-in/TS-out interface (get(uri) → ProcessingRecord, set(uri, record)) — “free to back this with a triplestore, files, or anything else.”
- The
w3id.org/lde/provenance# vocabulary lives in exactly one concrete class: FileLoadedSparqlProvenanceStore.
Proposal
Move the RDF serialisation (and its vocabulary choice) to the consumer, keeping ProcessingRecord + ProvenanceStore as the boundary:
- The consumer (DKG) owns turning a
ProcessingRecord into RDF under def.nde.nl, either by providing its own ProvenanceStore implementation, or by the store delegating record↔quads to a consumer-supplied serialiser while the reusable file/SPARQL get/set mechanics stay in @lde/pipeline.
- Do not inject predicate IRIs into the existing serialiser — that still couples the consumer to LDE's predicate choices and PROV-O shape. The boundary is the TypeScript record, so the consumer owns the entire shape.
Result: @lde/pipeline coins zero output vocabulary; DKG's provenance graph is def.nde.nl (or whatever the consumer chooses). Standard W3C vocabularies (PROV-O, DQV, schema.org) in output are fine — the concern is only LDE-coined w3id.org/lde identifiers.
Notes
Problem
@lde/pipelinecoins LDE vocabulary in consumer-facing output.FileLoadedSparqlProvenanceStoreserialises eachProcessingRecordto RDF underhttps://w3id.org/lde/provenance#(sourceFingerprint,pipelineVersion,status), which then lands in the consumer's (e.g. DKG's) provenance graph.Per the distribution-health design decision (netwerk-digitaal-erfgoed/dataset-register#2103, RDF emission & vocabulary boundary), LDE should be a behind-the-scenes implementation detail and coin no vocabulary that appears in consumer output. The consumer (NDE) owns the output vocabulary — standard W3C vocabularies plus
def.nde.nl. Thew3id.org/lde/provenance#terms are the one remaining leak.The seam already exists
The pipeline core does not emit self-coined RDF for provenance:
ProcessingRecordis pure TypeScript (sourceFingerprint,pipelineVersion,generatedAt,status).ProvenanceStoreis a TS-in/TS-out interface (get(uri) → ProcessingRecord,set(uri, record)) — “free to back this with a triplestore, files, or anything else.”w3id.org/lde/provenance#vocabulary lives in exactly one concrete class:FileLoadedSparqlProvenanceStore.Proposal
Move the RDF serialisation (and its vocabulary choice) to the consumer, keeping
ProcessingRecord+ProvenanceStoreas the boundary:ProcessingRecordinto RDF underdef.nde.nl, either by providing its ownProvenanceStoreimplementation, or by the store delegating record↔quads to a consumer-supplied serialiser while the reusable file/SPARQL get/set mechanics stay in@lde/pipeline.Result:
@lde/pipelinecoins zero output vocabulary; DKG's provenance graph isdef.nde.nl(or whatever the consumer chooses). Standard W3C vocabularies (PROV-O, DQV, schema.org) in output are fine — the concern is only LDE-coinedw3id.org/ldeidentifiers.Notes
@lde/pipelineexposes the seam;dataset-knowledge-graph(and any other consumer) migrates to own the serialisation.w3id.org/lderegistration (perma-id/w3id.org PR) is no longer needed for output once this lands.