feat: add data_purposes to Dataset, DatasetCollection, DatasetField by galvana · Pull Request #39 · ethyca/fideslang

galvana · 2026-03-16T23:45:27Z

Description Of Changes

Add data_purposes: Optional[List[FidesKey]] to three classes in the dataset model hierarchy, mirroring the existing data_categories pattern:

DatasetFieldBase — applies to individual fields (and sub-fields via DatasetField inheritance)
DatasetCollection — applies to all fields in the collection
Dataset — applies to all collections in the dataset

This enables purpose-based access control (PBAC): each level of the dataset hierarchy can declare which data purposes are allowed, and the PBAC engine evaluates consumer access against these purpose restrictions.

The field is Optional with a None default, so this is fully backward compatible — existing datasets without data_purposes are unaffected.

Code Changes

src/fideslang/models.py - Add data_purposes field to DatasetFieldBase, DatasetCollection, and Dataset

Steps to Confirm

from fideslang.models import Dataset, DatasetCollection, DatasetField

# All levels including recursive sub-fields
d = Dataset(
    fides_key="test",
    data_purposes=["marketing"],
    collections=[
        DatasetCollection(
            name="c1",
            data_purposes=["analytics"],
            fields=[
                DatasetField(name="f1", data_purposes=["fraud"]),
                DatasetField(
                    name="nested",
                    fides_meta={"data_type": "object"},
                    fields=[DatasetField(name="sub1", data_purposes=["compliance"])],
                ),
            ],
        )
    ],
)

j = d.model_dump(mode="json")
assert j["data_purposes"] == ["marketing"]
assert j["collections"][0]["data_purposes"] == ["analytics"]
assert j["collections"][0]["fields"][0]["data_purposes"] == ["fraud"]
assert j["collections"][0]["fields"][1]["fields"][0]["data_purposes"] == ["compliance"]

# Backward compat: None when omitted
d2 = Dataset(fides_key="old", collections=[DatasetCollection(name="c", fields=[DatasetField(name="f")])])
assert d2.data_purposes is None

Pre-Merge Checklist

🤖 Generated with Claude Code

…setField Add data_purposes as an optional field at all levels of the dataset hierarchy, mirroring the existing data_categories pattern. This enables purpose-based access control (PBAC) by declaring which data purposes are allowed for each dataset, collection, field, and sub-field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Update fideslang dependency to use the feat/add-data-purposes-to-dataset-models branch which adds data_purposes at dataset, collection, field, and sub-field levels. Dependency: ethyca/fideslang#39 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Suppress mypy errors caused by newer pydantic versions resolving in CI: - Add type: ignore[misc] for ValidationInfo explicit Any warnings - Remove stale type: ignore[assignment] comments no longer needed - Add type: ignore[arg-type] for Optional list default_factory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove unused `_: ValidationInfo` param from `validate_object_fields` (model_validator mode="after" doesn't need it) - Remove `Optional` from Taxonomy list fields that default to `[]` (type now matches the actual default value) - Remove `disallow_any_explicit` mypy setting that conflicts with pydantic's own types (ValidationInfo, Dict[str, Any]) - Clean up all `# type: ignore` comments that are no longer needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adamsachs

OK, this all looks fine, just left an inline note about None vs [] but i can see this was a deliberate choice for backward compatibility reasons which makes sense.

it does strike me as a bit odd that we have this data_purposes pointer without actually defining a DataPurpose model here in fideslang (unless i've missed it...?) do we have other examples of doing that? it just seems like it leaves our model here a bit incomplete and not self-contained, and something doesn't feel 'right' about that.

i won't consider that a blocker, but i do want to hear your thoughts on that.

adamsachs · 2026-03-25T17:24:41Z

src/fideslang/models.py

+    data_purposes: Optional[List[FidesKey]] = Field(
+        default=None,
+        description="Array of Data Purpose resources, identified by `fides_key`, that apply to this field.",
+    )


is this deliberately an Optional[List] that defaults to None rather than a List[] that defaults to []?

i realize this follows the data_categories convention above, but i just want to make sure we're not perpetuating a bad design choice. do we imagine that None will signify something different than an empty list ([]) in this case? perhaps the flexibility is good for a model as foundational as this one, because there may be many different applications, and it can be hard to anticipate whether some application may want to distinguish None from [] in the future, even if we don't have that use case now.

i can conceive of different meanings for None vs [] (e.g. None = 'hasn't yet been reviewed/annotated for data_purposes'; [] = 'reviewed and determined no data_purposes apply') so i think this choice is justifiable. but just wanted to raise this for thought/discussion/confirmation!

OK and now i see the PR note about backward compatibility. that seems like enough justification 👍

galvana mentioned this pull request Mar 16, 2026

Add data_purposes support to datasets ethyca/fides#7674

Merged

11 tasks

Adrian Galvan and others added 3 commits March 16, 2026 16:50

style: reformat with black (CI formatter)

22bc6b7

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

galvana marked this pull request as ready for review March 25, 2026 17:07

galvana requested a review from adamsachs March 25, 2026 17:07

adamsachs approved these changes Mar 25, 2026

View reviewed changes

galvana merged commit c5b7861 into main Mar 25, 2026
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add data_purposes to Dataset, DatasetCollection, DatasetField#39

feat: add data_purposes to Dataset, DatasetCollection, DatasetField#39
galvana merged 4 commits intomainfrom
feat/add-data-purposes-to-dataset-models

galvana commented Mar 16, 2026

Uh oh!

adamsachs left a comment

Uh oh!

adamsachs Mar 25, 2026

Uh oh!

adamsachs Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

galvana commented Mar 16, 2026

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

adamsachs left a comment

Choose a reason for hiding this comment

Uh oh!

adamsachs Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

adamsachs Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants