Standalone scripts for post-processing, QC, and visualization of ARG tree sequences (.ts, .trees, .tsz).
conda env create -f environment.yml
conda activate argtestCore dependencies are in environment.yml (numpy, matplotlib, tskit, tszip, msprime).
Builds an HTML summary of per-individual mutational load and writes outlier windows as BED when windowing is enabled.
Inputs:
- one tree sequence file
Key outputs:
results/<name>.htmlresults/<ts_stem>_outliers.bed(if--window-sizeis used)logs/<name>.log
Example:
python scripts/mutload_summary.py example_data/maize.tsz --window-size 50000 --cutoff 0.25 --out mutload.htmlMain options:
--window-size--cutoff--out--suffix-to-strip
Removes selected individuals either genome-wide (--individuals) or over BED intervals (--remove).
Inputs:
- one tree sequence file
- optional BED(s) with per-individual intervals
Key output:
- trimmed tree sequence (
--out, orresults/<ts_stem>_trimmed.tsz)
Example:
python scripts/trim_samples.py example_data/maize.tsz --individuals B73,Mo17 --out results/maize_trimmed.tszMain options:
--individuals--remove--out--suffix-to-strip
Applies a shared accessibility-based mask across a directory of tree sequences by:
- inferring mutation map (
*.mut_rate.p) from TS names, - keeping windows with accessible bp >=
--cutoff-bp, - collapsing masked intervals and writing renamed output TS files.
Inputs:
- directory of tree sequences
- inferred mutation-rate map file(s)
Key outputs:
- collapsed tree sequences in output directory
- one summary log (
collapse_log.txtby default)
Example:
python scripts/trim_regions.py \
--ts-dir /path/to/trees \
--window-size 50000 \
--cutoff-bp 2500 \
--pattern "*.tsz"Main options:
--ts-dir--window-size--cutoff-bp--out-dir--pattern--log
Generates SINGER-style validation/diagnostic plots (excluding coalescence/Ne curves) directly from a set of TS replicates.
Plots produced:
mutational-load.pngmutational-load-trace.pngdiversity-scatter.pngdiversity-skyline.pngdiversity-trace.pngtajima-d-scatter.pngtajima-d-skyline.pngtajima-d-trace.pngfrequency-spectrum.png- optional observed-vs-sim density plots (when
--simTSV is provided):diversity-density-vs-sim.pngtajima-d-density-vs-sim.png
summary.txt
Notes:
- branch diversity is scaled by
--mutation-ratefor site-vs-branch comparison - trace plots are branch-only MCMC outcomes
--simcan read the simulation TSV fromscripts/coalescence_ne_plots_from_ts.py --simand compare observed vs simulated windowedpiand Tajima's D using overlapping semi-transparent density plots
Example:
python scripts/validation_plots_from_ts.py \
--ts-dir ~/crud/collapsed \
--pattern "*.tsz" \
--window-size 100000 \
--mutation-rate 3.3e-8 \
--burnin-frac 0.5 \
--out-dir results/validation_plotsExample with simulation comparison:
python scripts/validation_plots_from_ts.py \
--ts-dir ~/crud/collapsed \
--pattern "*.tsz" \
--window-size 100000 \
--mutation-rate 3.3e-8 \
--burnin-frac 0.5 \
--sim results/coalescence_ne_plots/sim-window-stats.tsv \
--out-dir results/validation_plotsMain options:
--ts-dir--pattern--mutation-rate--burnin-frac--window-size--folded--sim--out-dir--prefix
Generates pair coalescence and effective population size plots from a set of TS replicates using explicit time bins.
Plots produced:
pair-coalescence-pdf.pngpair-coalescence-rates.pngeffective-pop-size.png(Ne = 1 / (2 * coal_rate))- optional simulation summary table:
sim-window-stats.tsv(when--sim > 0) summary.txt
Notes:
- time bins come from
--time-bins-file(explicit bin edges) --time-adjustrescales plotted x-axis values by dividing time by a factor--yearadds a red dashed vertical marker to the Ne plot- optional
--sim Xmode converts the inferred piecewise-constant Ne trajectory into a one-deme Demes model and runsXcoalescent simulations - simulations are
1 Mb, use the same sample size as the observed TS, and use--mufor both mutation and recombination rates - simulated nucleotide diversity and Tajima's D are computed in
50 Kbwindows and written as a TSV
Example:
python scripts/coalescence_ne_plots_from_ts.py \
--ts-dir ~/crud/collapsed \
--pattern "*.tsz" \
--time-bins-file /path/to/time_bins.txt \
--burnin-frac 0.5 \
--time-adjust 6.19476 \
--year 534 \
--log-rates \
--out-dir results/coalescence_ne_plotsExample with simulations:
python scripts/coalescence_ne_plots_from_ts.py \
--ts-dir ~/crud/collapsed \
--pattern "*.tsz" \
--time-bins-file /path/to/time_bins.txt \
--burnin-frac 0.5 \
--time-adjust 6.19476 \
--year 534 \
--sim 100 \
--mu 3.3e-8 \
--sim-outfile results/coalescence_ne_plots/sim-window-stats.tsv \
--out-dir results/coalescence_ne_plots \
--log-ratesMain options:
--ts-dir--pattern--time-bins-file--burnin-frac--tail-cutoff--time-adjust--year--sim--mu--sim-outfile--log-rates--out-dir--prefix
Renders one tree index from each of two tree sequences into a single HTML comparison.
Example:
python scripts/compare_trees_html.py a.tsz 9 b.tsz 9 --out tree_compare.htmlRenders all trees from two tree sequences as a top/bottom-row scrollable HTML gallery.
Example:
python scripts/trees_gallery_html.py a.tsz b.tsz --out trees_gallery.htmlscripts/argtest_common.py contains shared tree-sequence helpers used by multiple scripts:
- TS I/O (
load_ts,dump_ts) - mutational load/stat helpers
- trimming and masking helpers
Use this module for internal script imports.
- Generated
logs/andresults/are git-ignored. .DS_Storeis git-ignored.