ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

Paper | Data | Baselines

ReactSim-Bench is the first benchmark for systematicly evaluating the reactive capability of behavior world models in autonomous driving. It contains:

Reactive closed-loop protocol with decoupled control. In ReactSim-Bench, The behavior world model controls the surrounding agents, while the autonomous vehicle (AV) are controled by its own policy instead of the world model.
Customed AV behaviors beyond the log. ReactSim-Bench contains 2,636 scenarios with AV behaviors that differ from the log and create reactive pressure on surrounding agents. They are grouped into three categories: longitudinal,directional, and lateral deviations.
Safety and feasibility metrics. ReactSim-Bench evaluates Agent-AV safety, agent-agent safety, map compliance, driving-direction compliance, and kinematic feasibility.
Multiple baselines. We implement the Transformer-based (MTR), diffusion-based (CTG,VBD), and next-token-prediction-based (SMART, catk, Trajtok) behavior world models on ReactSim-Bench as baselines.

Usage

Install the repository, download the data and preprocess: document.
Setup the environment for each baseline and train or evalute:
- MTR
- CTG
- VBD
- SMART / CATK / TrajtTok
Follow the instruction to train and evaluate your own model on ReactSim-Bench

Dataset

ReactSim-Bench is built on nuPlan and contain 2636 test scenarios：

Category	Number of scenarios
Longitudinal deviation	937
Directional deviation	799
Lateral deviation	900
Total	2,636

The data is available at Hugging Face.

Benchmark

Method	A-AV Coll. Count	A-AV risky Count	A-A Coll. (%)	Offroad (%)	Direction violation (%)	Acceleration infeasibility (%)	Steering infeasibility (%)
Log Replay	0.9829	1.5380	2.25	0.18	0.80	0.16	2.51
MTR	0.1457	0.5819	3.29	2.67	2.83	0.64	14.29
CTG	0.6195	0.9476	4.88	2.95	2.10	10.87	7.08
VBD	0.2276	0.4711	3.19	1.03	2.35	0.01	0.18
SMART	0.1419	0.3976	2.23	0.68	1.09	9.74	4.83
CATK	0.1426	0.4029	2.22	0.69	1.13	10.25	5.02
TrajTok	0.1407	0.4173	2.23	0.61	1.03	3.23	3.93

The checkpoints of baselines are available at Hugging Face.

Citation

If you find ReactSim-Bench useful, pase cite:

@article{reactsimbench,
    title={ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving}, 
    author={Zhiyuan Zhang and Yanlun Peng and Jianing Zhang and Xianda Guo and Zehan Huang and Haoran Liu and Qifeng Li and Shaofeng Zhang and Xiaosong Jia and Junchi Yan},
    year={2026},
    eprint={2606.14058},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CTG		CTG
MTR		MTR
SMART_CATK_TrajTok		SMART_CATK_TrajTok
VBD		VBD
data/motion_anchors		data/motion_anchors
docs		docs
reactsim_data_pipeline		reactsim_data_pipeline
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

Paper | Data | Baselines

Usage

Dataset

Benchmark

Citation

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

Paper | Data | Baselines

Usage

Dataset

Benchmark

Citation

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages