Reference and how-to docs for cdot. These live in the repo, versioned with the code.
- Using local downloaded JSON.gz files - load release files into the HGVS libraries.
- Biocommons HGVS examples - c→g / g→c, plus a T2T-CHM13v2.0 example.
- PyHGVS examples - legacy PyHGVS integration (prefer biocommons).
- FastaSeqFetcher - local FASTA sequence fetching (SeqRepo replacement).
- Advanced usage - fixing messy HGVS input (
fix_hgvs/clean_hgvs) and read-ahead batch retrieval for bulk processing (RESTDataProvider.prefetch). - Transcript-version safety - the opt-in safe version fallback:
how cdot decides a version substitution is coordinate-preserving (
is_version_substitution_safe), and the study behind it.
- JSON data format - every field in a cdot JSON(.gz) file, auto-generated
from the typed models in
cdot/models.py. Machine-readable JSON Schema alongside it. - Coordinates & exon alignments - how exon coordinates, exon IDs and
the alignment
gap(CIGAR-like) strings work, with worked examples.
- GitHub release file details - what each released
.json.gzfile contains. - Create data from scratch - build the JSON files yourself from GTF/GFF3.
- cdot vs UTA - how cdot compares to the Universal Transcript Archive.
- Design notes & project direction - why JSON, known issues, project goals.