Preparation of datasets, exploratory data analysis, filtering, normalization of data.
feature_mapping.py,extra_feature_mapping.py- name mappings of features (from Russian to English short abbreviations).part1_ids_preprocessing.ipynb- preprocessing of VCF and phenotypes ids.part2_phenotype_preprocessing.ipynb- cleaning and preparation of phenotypes, renaming of cols, etc.part2.5_new_table_analysis.ipynb- the same, but with extra-features.part3_march_pheno_eda_and_normalysing.ipynb- EDA, filtering, normalization of data.
Common variants association study (CVAS) and Rare variants association study (RVAS).
sd3_gwas_com.ipynb- CVAS.final_rwas_pipeline_p1_hail_prepare.ipynb- first step of RVAS (preparation of tables).final_rwas_pipeline_p2_statistics_and_plots.ipynb- second step of RVAS (tests and plots).out_hail_gwas_com_sd3/- directpry with p values of cvas.out_hail_rvas/- directory with p values of rvas.
Validation of found SNPs: its annotation and statistical checks.
Variants_annotations_and_score_calculation.ipynb- the main script with all annotations, statistics counts, etc (columns_to_check.pyneed for this notebook).draw_data/- directory with datasets for R plots.other_data/- directory with outputs of this script, which are not needed for drawing pictures.
R code for drawing figures, and figures itself.
1_MH_QQ.R- draw Manhattan and QQ plots (files:Rectangular-Manhattan.*t*.pdfandQQplot*.pdfrespectively).2_PCA.R- draw plots for principal components from EDA (for sex and death). Images:images/pca_*.pdf.3_violin_regression.R- draw violin plots and regression on SNPS and associated features (fromdata/regression/regr_rs*_*.tsv). Images:images/regr_<rs>_<feature>.pdf.4_boxplots.R- draw boxplots for features by death and severity (fromdata/boxplots_analyses.tsv). Images:images/bozplot_violin_<death/severity>_<feature>.pdf.5_histplots.R- draw histplots. Imaged data:images/histogram_<feature>___top_10_score.pdf- histograms for score by severity/death/storm;images/score_hist.pdf- histogram pf the snps' score;images/histogram_<death/severity>_<feature>.pdf- histograms of features by death/severity.
images/- drawn figures.