Skip to content

Thinklab-SJTU/ReactSim-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

ReactSim-Bench is the first benchmark for systematicly evaluating the reactive capability of behavior world models in autonomous driving. It contains:

  • Reactive closed-loop protocol with decoupled control. In ReactSim-Bench, The behavior world model controls the surrounding agents, while the autonomous vehicle (AV) are controled by its own policy instead of the world model.
  • Customed AV behaviors beyond the log. ReactSim-Bench contains 2,636 scenarios with AV behaviors that differ from the log and create reactive pressure on surrounding agents. They are grouped into three categories: longitudinal,directional, and lateral deviations.
  • Safety and feasibility metrics. ReactSim-Bench evaluates Agent-AV safety, agent-agent safety, map compliance, driving-direction compliance, and kinematic feasibility.
  • Multiple baselines. We implement the Transformer-based (MTR), diffusion-based (CTG,VBD), and next-token-prediction-based (SMART, catk, Trajtok) behavior world models on ReactSim-Bench as baselines.

Usage

  1. Install the repository, download the data and preprocess: document.
  2. Setup the environment for each baseline and train or evalute:
  3. Follow the instruction to train and evaluate your own model on ReactSim-Bench

Dataset

ReactSim-Bench is built on nuPlan and contain 2636 test scenarios:

Category Number of scenarios
Longitudinal deviation 937
Directional deviation 799
Lateral deviation 900
Total 2,636

The data is available at Hugging Face.

Benchmark

Method A-AV Coll. Count A-AV risky Count A-A Coll. (%) Offroad (%) Direction violation (%) Acceleration infeasibility (%) Steering infeasibility (%)
Log Replay 0.9829 1.5380 2.25 0.18 0.80 0.16 2.51
MTR 0.1457 0.5819 3.29 2.67 2.83 0.64 14.29
CTG 0.6195 0.9476 4.88 2.95 2.10 10.87 7.08
VBD 0.2276 0.4711 3.19 1.03 2.35 0.01 0.18
SMART 0.1419 0.3976 2.23 0.68 1.09 9.74 4.83
CATK 0.1426 0.4029 2.22 0.69 1.13 10.25 5.02
TrajTok 0.1407 0.4173 2.23 0.61 1.03 3.23 3.93

The checkpoints of baselines are available at Hugging Face.

Citation

If you find ReactSim-Bench useful, pase cite:

@article{reactsimbench,
    title={ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving}, 
    author={Zhiyuan Zhang and Yanlun Peng and Jianing Zhang and Xianda Guo and Zehan Huang and Haoran Liu and Qifeng Li and Shaofeng Zhang and Xiaosong Jia and Junchi Yan},
    year={2026},
    eprint={2606.14058},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
}

Links

About

ReactSim-Bench: Benchmarking Reactive Behavior World Model Simulation in Autonomous Driving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors