pRoteomics is a repository of R-based workflows for spatial and systems-level proteomics analyses.
The repository has evolved from a primarily clusterProfiler/GSEA workflow into a broader analysis framework integrating:
- spatial proteomics
- differential enrichment analysis
- EWCE cell-type enrichment
- WGCNA and module analyses
- spatial region/layer network analysis
- bootstrap-based network stability analyses
- behavior and physiology coupling
- PRIDE/ProteomeXchange package preparation support
The current focus is high-resolution spatial proteomics across hippocampal region/layer structures combined with systems-level network and behavioral analyses.
01_preprocessing/
02_id_mapping/
03_qc_exploration/
04_differential_expression_enrichment/
05_celltype_enrichment_EWCE/
06_modules_WGCNA/
07_spatial_networks/
08_behavior_physio_coupling/
09_pride_submission/
09_export_pride_journal/
90_testing/
99_deprecated/
Canonical data and result folders:
data/raw/
data/metadata/
data/external/
data/processed/<module>/
results/figures/<module>/
results/tables/<module>/
results/source_data/<module>/
results/logs/<module>/
results/reports/<module>/
pride_submission/
Use R/paths.R for repo-relative paths. Local overrides should go in config/*.local.yml or environment variables such as PROTEOMICS_PROJECT_ROOT; do not commit machine-specific paths.
raw proteomics matrices + metadata
→ preprocessing / metadata harmonization
→ UniProt and ID mapping
→ QC and exploratory structure analysis
→ differential enrichment and GSEA
→ EWCE cell-type enrichment
→ WGCNA and module analyses
→ spatial network analyses
→ bootstrap stability analyses
→ behavior and physiology coupling
→ PRIDE / ProteomeXchange package metadata generation
Purpose:
- metadata formatting
- matrix harmonization
- imputation
- merged metadata generation
Representative scripts:
01_impute.r03_gct_extractR.r04_format_metadata.r
Purpose:
- UniProt mapping
- clusterProfiler-compatible ID conversion
- WGCNA-compatible identifier harmonization
Representative scripts:
01_MapThatProt.r02_MapThatProt_batch.r
The canonical contrast handoff is now 01_preprocessing/03_gct_extractR.r to 02_id_mapping/01_MapThatProt_batch.r, producing clusterProfiler-ready mapped files at data/processed/02_id_mapping/mapped/<dataset>/forward/per_file/. Run this per dataset family, such as neuron_neuropil, neuron_soma, or microglia.
Purpose:
- PCA
- variance partitioning
- rank abundance analysis
- protein/peptide QC
Representative scripts:
03_pcaPlot.r05_pcaPlot_v3.r07_varPart.r
Purpose:
- GO enrichment
- GSEA
- pathway comparison
- publication-style enrichment figures
Representative scripts:
01_clusterProfiler.r02_compareGO.r03_compare_pathways.r
01_clusterProfiler.r now writes a manifest at:
data/processed/04_differential_expression_enrichment/clusterProfiler/<dataset>/clusterProfiler_manifest.csv
02_compareGO.r consumes that manifest instead of recursively discovering arbitrary CSVs. It filters by the required dataset column so neuron neuropil, neuron soma, and microglia runs do not overwrite or mix with each other. This preserves ontology, comparison, route category, route unit, simplification state, plot-used status, input hashes and config hashes across the clusterProfiler to compareGO handoff.
Dry-run checks are available:
Rscript 04_differential_expression_enrichment/01_clusterProfiler.r --dry-run
Rscript 04_differential_expression_enrichment/02_compareGO.r --dry-runPurpose:
- EWCE analyses
- spatial cell-type interpretation
- measured-proteome-aware enrichment workflows
Representative scripts:
01_EWCE_E9.r
01_EWCE_E9.r now uses canonical module folders and supports --dry-run.
Purpose:
- WGCNA
- module scoring
- module preservation
- overlap-based module generation
Representative scripts:
01_WGCNA.r02_WGCNAtraitpreservation.r03_module_spatial_networks.r
Phase 3 canonicalized the safer helper/downstream scripts 02_module_spatial_networks.r, 03_overlap_modules.r, and 91_module_score_v0.0.2.r. The central 01_WGCNA v.2.0.0.r remains documented for a data-aware path refactor.
Purpose:
- anatomical region/layer relationship networks
- differential network analysis
- bootstrap network validation
- chord/network visualization
Representative scripts:
01_network_spatial_relations.r02_differential_networks.r03_bootstrap_network_stability.r
Phase 4 canonicalized 01_network_spatial_relations.r, and Phase 3 canonicalized 02_differential_networks.r, 03_bootstrap_network_stability.r, 04_bootstrap_differential_network_stability.r, 05_bootstrap_differential_network_figures.r, and 06_chord_diagram.r. The producer writes the spatial object at data/processed/07_spatial_networks/network_spatial_relations/network_spatial_relations_objects.rds, which downstream network and behavior scripts consume.
Purpose:
- connect proteomics and network structure with behavior and physiology
- movement/stress score integration
- systems-level phenotype coupling
Representative scripts:
01_correlate_proteomics_with_behavior.r02_network_behavior_coupling.r
02_network_behavior_coupling.r now reads behavior files from data/external/behavior/, writes canonical outputs under results/*/08_behavior_physio_coupling/network_behavior_coupling/, and supports --dry-run.
Purpose:
- create a local PRIDE/ProteomeXchange package skeleton
- generate SDRF-style sample-to-file metadata
- generate a file manifest
- calculate MD5 checksums
- validate whether raw files, search outputs, processed results, and metadata are traceable
Main command:
source("09_pride_submission/00_make_pride_package.R")This creates or updates local files under the gitignored PRIDE_package/ folder:
PRIDE_package/00_metadata/sdrf_proteomics.tsv
PRIDE_package/00_metadata/pride_file_manifest.tsv
PRIDE_package/00_metadata/checksum.md5
PRIDE_package/00_metadata/validation_report.tsv
Large raw/vendor mass spectrometry files should be uploaded to PRIDE, not committed to GitHub.
Detailed instructions are in:
09_pride_submission/README_PRIDE.md
Purpose:
- generate PRIDE-ready manifests
- stage sample metadata and SDRF-like metadata
- stage publication supplementary tables
- validate deposition readiness
- write methods/provenance summaries
Main scripts:
09_export_pride_journal/01_make_pride_manifest.R
09_export_pride_journal/02_make_sample_metadata.R
09_export_pride_journal/03_make_supplementary_tables.R
09_export_pride_journal/04_validate_pride_submission.R
09_export_pride_journal/05_make_methods_summary.R
Each export script supports --dry-run to validate expected folders and inputs without staging or writing analysis products.
A recommended execution order is documented in:
RUN_ORDER.md
The repository also contains:
90_testing/for exploratory or developmental workflows99_deprecated/for archived legacy scripts retained for reproducibility
Recommended:
- R >= 4.2
- RStudio or VS Code
- macOS/Linux preferred for large workflows
Core package ecosystem includes:
- tidyverse
- clusterProfiler
- enrichplot
- WGCNA
- limma
- EWCE
- ggplot2
- openxlsx
- igraph
- ggraph
- patchwork
- pheatmap
- ComplexHeatmap
- lme4/lmerTest
- mgcv
Additional dependencies vary between workflows.
Typical outputs include:
- publication-ready SVG/PDF figures
- enrichment tables
- network edge tables
- bootstrap stability summaries
- module score matrices
- EWCE outputs
- source-data tables
- QC reports
- PRIDE package metadata, manifests, checksums, and validation reports
The repository intentionally preserves:
- exploratory workflows
- older analysis versions
- testing scripts
- alternative implementations
This is done to maintain reproducibility of intermediate biological findings and figure-generation pipelines.
Older scripts are preferentially moved to 99_deprecated/ rather than deleted.
Tobias Pohl