FlashPPI is a contrastively trained model for protein-protein interaction (PPI) prediction, grounded in residue-level interactions, that enables full-proteome interaction prediction in minutes.
By reframing PPI prediction as a dense retrieval task, FlashPPI circumvents the
- Scalable: Reduces proteome-wide screening from days/months to minutes.
- Interpretable: Predicts fine-grained, residue-level 2D contact maps for retrieved interaction candidates.
- Genomic Priors: Leverages gLM2 initialization to capture cross-protein, multi-gene co-evolutionary signals.
FlashPPI is integrated into seqhub.org. You can upload a FASTA and interactively explore whole-proteome networks and contact maps. Explore an example network here.
pip install -r requirements.txtOptionally, install Flash Attention for faster inference on GPU:
pip install flash-attn --no-build-isolationRun the prediction script by passing your proteome FASTA file. It will output a predictions file with predicted pairs of interacting proteins and confidence scores. Note: Requires a machine with at least 1 GPU.
python predict_proteome.py --fasta my_proteome.fasta --output predictions.csvPredict interactions between two proteomes, for example a viral genome and its host genome.
python predict_cross_proteome.py \
--host_fasta host.fasta \
--viral_fasta virus.fasta \
--output predictions.csvimport torch
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoTokenizer
seq1 = "MKTAYIAKQRQISFVKSHFSRQL"
seq2 = "MSTAGKVIKCKAAVLW"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained("tattabio/flashppi", trust_remote_code=True)
model = AutoModel.from_pretrained("tattabio/flashppi", trust_remote_code=True).to(device).eval()
inputs1 = tokenizer(seq1, return_tensors="pt").to(device)
inputs2 = tokenizer(seq2, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(
input_ids1=inputs1["input_ids"],
attention_mask1=inputs1["attention_mask"],
input_ids2=inputs2["input_ids"],
attention_mask2=inputs2["attention_mask"],
return_dict=True
)
# Extract map and trim padding
contact_map = outputs.contact_map[0].cpu().numpy()
len1, len2 = inputs1["attention_mask"].sum().item(), inputs2["attention_mask"].sum().item()
contact_map = contact_map[:len1, :len2]
plt.imshow(contact_map, cmap="Blues", vmin=0, vmax=1)
plt.savefig("contact_map.png")# Multi-GPU
accelerate launch --config_file configs_accelerate/multi_gpu.yaml -m flashppi.train configs_train/flashppi.yaml
# Single CPU (testing)
accelerate launch --config_file configs_accelerate/cpu.yaml -m flashppi.train configs_train/flashppi_ESM_small.yamlThis repository is licensed under the CC BY-NC 4.0 license. Free for academic and research use.
If you use FlashPPI or our datasets in your research, please cite:
@article {Cornman2026FlashPPI,
author = {Cornman, Andre and Tranzillo, Matt and Zulaybar, Nicolo G and Bouzit, Imane and Hwang, Yunha},
title = {Linear-time prediction of proteome-scale microbial protein interactions},
year = {2026},
doi = {10.64898/2026.03.01.708874},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/03/02/2026.03.01.708874},
journal = {bioRxiv}
}
