Skip to content

Refactor clinical_controls into a dedicated clinvar_variants table with dual identifier support #704

@bencap

Description

@bencap

Background

MaveDB currently stores ClinVar associations in a generic clinical_controls table, originally designed to hold associations from multiple external databases. In practice it only holds ClinVar data, and the original generalized design has become a liability: ClinVar's own identifier model requires two distinct fields — Allele ID and Variation ID — which don't fit cleanly into a generic structure.

Additionally, ClinVar has been converging on Variation ID as the canonical public identifier (anchoring their web UI, VCF files, and search), while Allele ID remains the correct handle for allele-level cross-references like gnomAD. MaveDB currently only stores Allele ID, meaning our external ClinVar links are using a secondary identifier.

Proposed Change

Replace clinical_controls with a dedicated clinvar_variants table with explicit fields for both identifiers:

  • clinvar_allele_id — for allele-level cross-references (gnomAD, etc.)
  • clinvar_variation_id — for external ClinVar links (clinvar/variation/{id})

Both fields can be populated from the ClinVar TSV, which contains both IDs, so no additional lookups are required. For the simple SNVs and indels MaveDB works with, these are effectively 1-to-1.

This brings ClinVar annotations in line with other MaveDB annotations, which are organized around dedicated, source-specific data structures rather than a generic association table. If additional external database associations are added in the future, they should continue to follow this pattern.

Acceptance Criteria

  • clinical_controls replaced with clinvar_variants in the data model
  • Migration persists all existing Allele ID data and populates Variation IDs
  • ClinVar ingestion populates both fields from the TSV
  • External ClinVar links updated to use Variation ID
  • API response exposes both fields with clear naming
  • MaveMD UI links updated accordingly

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: databaseTask implementation requires database changesapp: frontendTask implementation touches the frontendtype: enhancementEnhancement to an existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions