feat: add data_purposes to Dataset, DatasetCollection, DatasetField#39
feat: add data_purposes to Dataset, DatasetCollection, DatasetField#39
Conversation
…setField Add data_purposes as an optional field at all levels of the dataset hierarchy, mirroring the existing data_categories pattern. This enables purpose-based access control (PBAC) by declaring which data purposes are allowed for each dataset, collection, field, and sub-field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update fideslang dependency to use the feat/add-data-purposes-to-dataset-models branch which adds data_purposes at dataset, collection, field, and sub-field levels. Dependency: ethyca/fideslang#39 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Suppress mypy errors caused by newer pydantic versions resolving in CI: - Add type: ignore[misc] for ValidationInfo explicit Any warnings - Remove stale type: ignore[assignment] comments no longer needed - Add type: ignore[arg-type] for Optional list default_factory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused `_: ValidationInfo` param from `validate_object_fields` (model_validator mode="after" doesn't need it) - Remove `Optional` from Taxonomy list fields that default to `[]` (type now matches the actual default value) - Remove `disallow_any_explicit` mypy setting that conflicts with pydantic's own types (ValidationInfo, Dict[str, Any]) - Clean up all `# type: ignore` comments that are no longer needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
adamsachs
left a comment
There was a problem hiding this comment.
OK, this all looks fine, just left an inline note about None vs [] but i can see this was a deliberate choice for backward compatibility reasons which makes sense.
it does strike me as a bit odd that we have this data_purposes pointer without actually defining a DataPurpose model here in fideslang (unless i've missed it...?) do we have other examples of doing that? it just seems like it leaves our model here a bit incomplete and not self-contained, and something doesn't feel 'right' about that.
i won't consider that a blocker, but i do want to hear your thoughts on that.
| data_purposes: Optional[List[FidesKey]] = Field( | ||
| default=None, | ||
| description="Array of Data Purpose resources, identified by `fides_key`, that apply to this field.", | ||
| ) |
There was a problem hiding this comment.
is this deliberately an Optional[List] that defaults to None rather than a List[] that defaults to []?
i realize this follows the data_categories convention above, but i just want to make sure we're not perpetuating a bad design choice. do we imagine that None will signify something different than an empty list ([]) in this case? perhaps the flexibility is good for a model as foundational as this one, because there may be many different applications, and it can be hard to anticipate whether some application may want to distinguish None from [] in the future, even if we don't have that use case now.
i can conceive of different meanings for None vs [] (e.g. None = 'hasn't yet been reviewed/annotated for data_purposes'; [] = 'reviewed and determined no data_purposes apply') so i think this choice is justifiable. but just wanted to raise this for thought/discussion/confirmation!
There was a problem hiding this comment.
OK and now i see the PR note about backward compatibility. that seems like enough justification 👍
Description Of Changes
Add
data_purposes: Optional[List[FidesKey]]to three classes in the dataset model hierarchy, mirroring the existingdata_categoriespattern:DatasetFieldBase— applies to individual fields (and sub-fields viaDatasetFieldinheritance)DatasetCollection— applies to all fields in the collectionDataset— applies to all collections in the datasetThis enables purpose-based access control (PBAC): each level of the dataset hierarchy can declare which data purposes are allowed, and the PBAC engine evaluates consumer access against these purpose restrictions.
The field is
Optionalwith aNonedefault, so this is fully backward compatible — existing datasets withoutdata_purposesare unaffected.Code Changes
src/fideslang/models.py- Adddata_purposesfield toDatasetFieldBase,DatasetCollection, andDatasetSteps to Confirm
Pre-Merge Checklist
CHANGELOG.md🤖 Generated with Claude Code