Skip to content

TattaBio/FlashPPI

Repository files navigation

FlashPPI: Linear-time prediction of proteome-scale microbial protein interactions

FlashPPI model overview

Model Description

FlashPPI is a contrastively trained model for protein-protein interaction (PPI) prediction, grounded in residue-level interactions, that enables full-proteome interaction prediction in minutes.

By reframing PPI prediction as a dense retrieval task, FlashPPI circumvents the $\mathcal{O}(N^2)$ computational bottleneck of traditional all-vs-all structural screening.

  • Scalable: Reduces proteome-wide screening from days/months to minutes.
  • Interpretable: Predicts fine-grained, residue-level 2D contact maps for retrieved interaction candidates.
  • Genomic Priors: Leverages gLM2 initialization to capture cross-protein, multi-gene co-evolutionary signals.

Web Server

FlashPPI is integrated into seqhub.org. You can upload a FASTA and interactively explore whole-proteome networks and contact maps. Explore an example network here.

Installation

pip install -r requirements.txt

Optionally, install Flash Attention for faster inference on GPU:

pip install flash-attn --no-build-isolation

Usage

Fast Proteome-wide PPI Screening (All-vs-All)

Run the prediction script by passing your proteome FASTA file. It will output a predictions file with predicted pairs of interacting proteins and confidence scores. Note: Requires a machine with at least 1 GPU.

python predict_proteome.py --fasta my_proteome.fasta --output predictions.csv

Cross-Proteome PPI Screening (Host–Viral)

Predict interactions between two proteomes, for example a viral genome and its host genome.

python predict_cross_proteome.py \
    --host_fasta host.fasta \
    --viral_fasta virus.fasta \
    --output predictions.csv

Visualizing contact predictions

import torch
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoTokenizer

seq1 = "MKTAYIAKQRQISFVKSHFSRQL"
seq2 = "MSTAGKVIKCKAAVLW"

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained("tattabio/flashppi", trust_remote_code=True)
model = AutoModel.from_pretrained("tattabio/flashppi", trust_remote_code=True).to(device).eval()

inputs1 = tokenizer(seq1, return_tensors="pt").to(device)
inputs2 = tokenizer(seq2, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(
        input_ids1=inputs1["input_ids"],
        attention_mask1=inputs1["attention_mask"],
        input_ids2=inputs2["input_ids"],
        attention_mask2=inputs2["attention_mask"],
        return_dict=True
    )

# Extract map and trim padding
contact_map = outputs.contact_map[0].cpu().numpy()
len1, len2 = inputs1["attention_mask"].sum().item(), inputs2["attention_mask"].sum().item()
contact_map = contact_map[:len1, :len2]

plt.imshow(contact_map, cmap="Blues", vmin=0, vmax=1)
plt.savefig("contact_map.png")

Training

# Multi-GPU
accelerate launch --config_file configs_accelerate/multi_gpu.yaml -m flashppi.train configs_train/flashppi.yaml

# Single CPU (testing)
accelerate launch --config_file configs_accelerate/cpu.yaml -m flashppi.train configs_train/flashppi_ESM_small.yaml

License

This repository is licensed under the CC BY-NC 4.0 license. Free for academic and research use.

Citing

If you use FlashPPI or our datasets in your research, please cite:

@article {Cornman2026FlashPPI,
	author = {Cornman, Andre and Tranzillo, Matt and Zulaybar, Nicolo G and Bouzit, Imane and Hwang, Yunha},
	title = {Linear-time prediction of proteome-scale microbial protein interactions},
	year = {2026},
	doi = {10.64898/2026.03.01.708874},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2026/03/02/2026.03.01.708874},
	journal = {bioRxiv}
}

About

Linear-time prediction of proteome-scale microbial protein interactions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages