Felix Draxler*, Justus Will*, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt
Parallel Token Prediction (PTP) predicts consistent sequences of tokens in one transformer call. It does so by moving the randomness involved in sampling autoregressive models into the input of the model. This makes the prediction of several tokens a unique function.
git clone https://github.com/mandt-lab/ptp
cd ptp
uv sync
source .venv/bin/activateRun the following command to setup an experiment directory:
ptp_distill model_name dataset_namemodel_name: Anything that works with HuggingFacetransformer.AutoModelForCausalLM.from_pretrained(model_name)dataset_name: Anything that can be loaded with HuggingFacedatasets.load_dataset(dataset_name)
The command will then setup an experiment config according interactively.
The main choice is how to treat the training data. In the paper, we train on completions by the base model, so we first need to generate them from prompts in the dataset. The code also supports directly on training sequences, without consulting the model. This gives you results faster; training on base model completions probably results in larger inference speedup.
After completing the setup, use ptp_pregenerate to sample base model completions to train on (optional), and ptp_train to train your PTP model.
- Pregenerate teacher completions:
ptp_pregenerate paper-checkpoints/[experiment-name]
- Train with:
ptp_train paper-checkpoints/[experiment-name]
Here, experiment-name is a directory in the paper-checkpoints folder:
We are currently working on releasing our model weights.
For fast inference using Partial Quadratic Decoding, use this demo script:
ptp_generate [experiment-directory]It responds like a chatbot, reporting how many tokens each model call generated, roughly corresponding to the speedup compared to next-token prediction.
@inproceedings{
draxler2026parallel,
title={Parallel Token Prediction for Language Models},
author={Felix Draxler and Justus Will and Farrin Marouf Sofian and Theofanis Karaletsos and Sameer Singh and Stephan Mandt},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=AGJomYSrUG}
}