: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation

Jinyu Liu, Xincheng Shuai, Henghui Ding, Yu-Gang Jiang

Fudan University

TL;DR: Unison evaluates Unified Multimodal Models (UMMs) by leveraging the synergy between understanding and generation capabilities across four comprehensive dimensions. The automatic evaluation model Unison-Judge achieves an 88.7% alignment with human judgments.

🔥 Updates

[2026/06/26] Annotations about human consistency are released.
[2026/06/25] Unison-Bench and Unison-Judge are released.

✅ TODO

Inference and evaluation scripts
Unison Benchmark data and Unison-Judge model weights
The UMM toolkit TorchUMM will support Unison in the last few days
Evaluation results for more recent open-source models (Emu3.5, Ovis-U1, Ming series etc.) and the latest closed-source models (GPT-5.5 and Gemini 3.1 series)

📬 Contact: If you have any questions, feel free to contact us at liujy24@m.fudan.edu.cn.

📖 Overview

📊 Evaluation Results

Open-Source Unified Multimodal Models

Model	Params	Internal Consistency			Und.-Guided Gen.			Gen-Guided Und.			Mutual Enhancement			Overall
Model	Params	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Overall
Show-o	1.3B	88.3	64.7	58.5	8.90	-	-	12.0	-	-	-	-	-	-
Janus-Pro	1.5B	94.4	47.1	45.0	0.3	-	-	19.2	-	-	-	-	-	-
Show-o2	1.5B	96.0	67.9	65.8	26.7	-	-	9.4	-	-	-	-	-	-
D-DiT	2B	86.5	65.0	58.1	0.2	-	-	6.8	-	-	-	-	-	-
ILLUME+	3B	43.4	19.9	10.5	10.3	7.7	9.0	11.3	30.1	15.1	1.0	5.5	3.2	9.4
Janus-Pro	7B	95.7	71.7	69.8	3.2	-	-	15.1	-	-	-	-	-	-
Show-o2	7B	97.2	73.8	72.5	9.9	-	-	9.2	-	-	-	-	-	-
ILLUME+	7B	80.2	20.4	16.7	12.4	10.4	11.4	11.3	27.7	13.9	2.7	6.8	4.8	11.7
OmniGen2 🥈	7B	92.3	79.0	74.5	61.3	42.6	52.0	19.7	41.9	30.9	45.0	50.3	47.7	51.3
TokenFlow	14B	93.0	47.1	44.5	20.1	-	-	17.0	-	-	-	-	-	-
BAGEL 🥇	14B	96.0	82.5	80.3	57.6	78.1	67.9	28.2	41.6	32.0	7.2	57.7	32.5	53.2
SEED-X	17B	82.8	38.9	34.2	18.6	13.7	16.1	13.5	27.4	20.8	0.2	16.8	8.5	19.9
UniWorld-V1 🥉	19B	92.6	68.5	65.1	63.4	26.4	44.9	22.8	32.0	26.9	46.4	16.2	31.3	42.1

Closed-Source Models

Model	Params	Internal Consistency			Und.-Guided Gen.			Gen-Guided Und.			Mutual Enhancement			Overall
Model	Params	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Und.	Gen.	Uni.	Overall
Gemini 3 Pro	-	98.3	88.1	86.9	71.0	82.8	76.9	42.2	46.5	43.9	65.3	77.4	71.4	69.8
GPT-5.2	-	98.6	86.3	84.7	69.7	85.7	77.7	44.8	58.2	52.7	69.1	71.2	70.2	71.3

📦 Data Preparation

Download Unison-Bench from HuggingFace into data/ at the repo root:

huggingface-cli download FudanCVL/Unison \
    --repo-type dataset --local-dir data/

The expected layout:

Unison/
└── data/
    ├── Internal_Consistency/
    ├── Und_Guided_Gen/
    ├── Gen_Guided_Und/
    └── Mutual_Enhancement/

Both launch scripts default to DATA_DIR=../data, so no extra flags are needed. To use a different path, pass --data-dir /path/to/data or set DATA_DIR.

🛠️ Installation

Step 1 — Base environment

cd Inference_Pipeline
UMM=/data/Unified_Models ./setup_envs.sh base

Creates the unison conda env from the root requirements.txt. This env covers both the inference and the evaluation pipeline.

Step 2 — Per-model environments

# All models at once
UMM=/data/Unified_Models ./setup_envs.sh

# Or selected models
UMM=/data/Unified_Models ./setup_envs.sh bagel omnigen2

Group	conda env	Upstream repo
`bagel`	`bagel`	`ByteDance-Seed/Bagel`
`janus`	`janus`	`deepseek-ai/Janus`
`omnigen2`	`omnigen2`	`VectorSpaceLab/OmniGen2`
`seedx`	`seedx`	`AILab-CVC/SEED-X`
`showo`	`showo2`	`showlab/Show-o`
`tokenflow`	`tokenflow`	`ByteVisionLab/TokenFlow`
`uniworld`	`univa`	`PKU-YuanGroup/UniWorld`
`illume`	`illume`	`illume-unified-mllm/ILLUME_plus`
`ddit`	`d-dit`	`zijieli-Jlee/Dual-Diffusion`

Each group clones its upstream repo into $UMM/<Repo> and installs it into the corresponding conda env. The script is idempotent; logs go to setup_logs/.

🤗 Model Weights

Benchmark model weights

Model configs in Inference_Pipeline/config/*.json reference local weight paths using the placeholder root /path/to/Unified_Models/.... Edit each config to point at your local checkout, e.g.:

{
  "model_name": "UniWorld-V1",
  "model_path": "/path/to/Unified_Models/UniWorld/UniWorld-V1/model_weights/UniWorld-V1",
  "api_type": "uniworld",
  "conda_env": "univa",
  "capabilities": ["understanding", "generation", "editing"]
}

download_weights.sh fetches weights for all model backends. Set the local weight root and pick models:

UMM=/data/Unified_Models ./download_weights.sh                 # everything
UMM=/data/Unified_Models ./download_weights.sh bagel showo1    # selected groups

Gated repos (FLUX.1-dev, SD3) need huggingface-cli login + license acceptance. Run setup_envs.sh and download_weights.sh with the same UMM so code and weights share one root.

Unison-Judge

The default evaluation backend runs Unison-Judge.

Where to put it: download the checkpoint into Evaluation_Pipeline/unison-judge/. That is the default path used by evaluate_unison.py and run_evaluate_unison.sh, so no flags are needed:

Unison/
└── Evaluation_Pipeline/
    └── unison-judge/            # <- put Unison-Judge weights here
        ├── config.json
        ├── model-*.safetensors
        └── ...

To keep it elsewhere, set LOCAL_JUDGE_MODEL=/path/to/judge or pass --local-model-path /path/to/judge. No local judge weights are needed when using the api backend.

🚀 Inference

cd Inference_Pipeline

# Run all tasks on one model
GPUS=0,1,2,3,4,5,6,7 MODELS=BAGEL-7B-MoT TASKS=IC,UGG,GGU,ME ./run.sh

# Select tasks or test with 2 items
GPUS=0,1,2,3 MODELS=UniWorld-V1 TASKS=IC,GGU ./run.sh
GPUS=0 MODELS=Janus-Pro-7B TEST_MODE=true ./run.sh

Results are written to result/<ModelName>/<TaskID>/<TaskID>_<ModelName>_results.csv.

📐 Evaluation

cd Evaluation_Pipeline

# Local judge (default) — uses Unison-Judge weights
GPU_IDS=0,1,2,3 MODELS=BAGEL-7B-MoT ./run_evaluate_unison.sh

# Select tasks or evaluate several models at once
MODELS=BAGEL-7B-MoT TASKS=IC,GGU ./run_evaluate_unison.sh
MODELS="BAGEL-7B-MoT,UniWorld-V1" GPU_IDS=0,1,2,3,4,5,6,7 ./run_evaluate_unison.sh

# Closed-source model API judge
JUDGE_BACKEND=api OPENAI_API_KEY=sk-... MODELS=UniWorld-V1 ./run_evaluate_unison.sh

# Aggregate results across models
python aggregate_results.py   # -> evaluation_summary.json

Output per model: eval_<ModelName>.json.

🙏 Acknowledgement

We sincerely thank the open-source community for their outstanding contributions. Unison-Judge is built upon Qwen3-VL. The evaluated models, including BAGEL, UniWorld, OmniGen2, Show-o, Janus-Pro, SEED-X, TokenFlow, ILLUME+, and D-DiT et al., form the foundation of this benchmark. We are grateful to all the authors for making their work publicly available.

📝 Citation

If you find this work useful, please cite:

@inproceedings{liu2026unison,
  title     = {{Unison}: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation},
  author    = {Liu, Jinyu and Shuai, Xincheng and Ding, Henghui and Jiang, Yu-Gang},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
Evaluation_Pipeline		Evaluation_Pipeline
Inference_Pipeline		Inference_Pipeline
images		images
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation

🔥 Updates

✅ TODO

📖 Overview

📊 Evaluation Results

Open-Source Unified Multimodal Models

Closed-Source Models

📦 Data Preparation

🛠️ Installation

Step 1 — Base environment

Step 2 — Per-model environments

🤗 Model Weights

Benchmark model weights

Unison-Judge

🚀 Inference

📐 Evaluation

🙏 Acknowledgement

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

: Benchmarking Unified Multimodal Models via Synergistic Understanding and Generation

🔥 Updates

✅ TODO

📖 Overview

📊 Evaluation Results

Open-Source Unified Multimodal Models

Closed-Source Models

📦 Data Preparation

🛠️ Installation

Step 1 — Base environment

Step 2 — Per-model environments

🤗 Model Weights

Benchmark model weights

Unison-Judge

🚀 Inference

📐 Evaluation

🙏 Acknowledgement

📝 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages