Skip to content

Create a VariantData.from_arrays method#1074

Open
hyanwong wants to merge 2 commits intotskit-dev:mainfrom
hyanwong:in-mem-zarr
Open

Create a VariantData.from_arrays method#1074
hyanwong wants to merge 2 commits intotskit-dev:mainfrom
hyanwong:in-mem-zarr

Conversation

@hyanwong
Copy link
Member

@hyanwong hyanwong commented Feb 6, 2026

This makes an in-memory vdata object that can be used for testing. Fixes #924

@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.16%. Comparing base (b436755) to head (ed5bd13).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1074   +/-   ##
=======================================
  Coverage   87.16%   87.16%           
=======================================
  Files           5        5           
  Lines        1792     1792           
  Branches      317      317           
=======================================
  Hits         1562     1562           
  Misses        140      140           
  Partials       90       90           
Flag Coverage Δ
C 87.16% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hyanwong hyanwong force-pushed the in-mem-zarr branch 3 times, most recently from c2bd995 to aeaa446 Compare February 6, 2026 14:36
This makes an in-memory vdata object that can be used for testing. Fixes tskit-dev#924
See tskit-dev#1028 - however we still have an out-by-one error when taking the sequence length from the VCF contig
Copy link
Member

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right way to go, and the API will result in more hard to maintain tests.

I'm thinking about some upstream infractructure in bio2zarr which will make it easy to create Zarr groups that are valid vcf zarr entitities. Leave it with me for now, please.

@classmethod
def from_arrays(
cls,
variant_matrix_phased,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call_genotype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could indeed use that, but adding _phased is intended to indicate to any user that it is assumed phased. But if you are refactoring, then perhaps not worth me revisiting it?

If there is a neat bio2zarr way to make in-memory VariantData versions from simple arrays for testing, then great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Class method to create simple VariantData files for demos

2 participants