Add parallel eval runner for understanding benchmarks (如果有多卡的话,可以用cli多卡并行推在理解benchmark)#5
Open
MqLeet wants to merge 1 commit intoAIFrontierLab:mainfrom
Open
Conversation
Introduces two small modules under src/umm/eval/ that lift the distributed-init / sharding / shard-merge boilerplate out of each understanding-class eval CLI. * src/umm/eval/distributed.py — DistInfo dataclass, dist init/barrier/ all-reduce, rank-shard path, glob-based shard merge/cleanup. Lazy torch import so single-card callers pay no import cost. * src/umm/eval/runner.py — run_sharded_inference(): round-robin sample assignment by sample_idx, per-rank JSONL shard append (flush+fsync), resume via caller-supplied done_ids, optional global max_samples cap. Accepts an infer_fn callable so the runner is unit-testable without a real model. Refactors mmbench/mme/mmmu/mathvista/mmvet eval CLIs to use the runner. mathvista and mmvet gain parallel support; the others have their duplicated dist plumbing replaced. The runner does only "shard inference + shard merge". Each CLI keeps its post-processing (Excel/JSON output, calculation.py invocation, mathvista's LLM extraction) behind `if rank == 0:`. Behavior in single-card mode: final user-facing outputs are identical. Mid-run checkpoint format changes (mme TSV→JSONL during run, mathvista and mmvet dict-JSON→JSONL) so a partial run with the prior code cannot be resumed by this code; fresh runs work identically. Out of scope (follow-up PRs): each backbone adapter's LOCAL_RANK handling — only show_o currently honors LOCAL_RANK, others default to cuda:0 or device_map="auto" and need adaptation before they work correctly under torchrun multi-rank. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi @ApiaoSamaa ,感谢你们开源的优秀工作!我是Jindong老师的粉丝,前几天在用torchumm来做理解任务的的bench,自己适配了一下多卡环境下的并行推理,特意提个PR支持一下工作~
Summary
Introduces a small, focused runner for distributed sharded inference and refactors the 5 understanding-class eval CLIs (mmbench, mme, mmmu, mathvista, mmvet) to use it. Lifts the distributed-init / round-robin sharding / per-rank JSONL checkpoint / rank-0 merge boilerplate into one place.
After this PR, all 5 CLIs can run under torchrun --nproc_per_node=N for data-parallel evaluation, and single-card behavior is preserved (final user-facing output files are byte-equivalent for benchmarks that already had a defined output format).
What's added?
distributed runner changes
per-cli changes
All 5 CLIs (mmbench, mme, mmmu, mathvista, mmvet) gain parallel support and share the same shape:
Usage
Single-GPU (unchanged)
PYTHONPATH=src python -m umm.cli.main eval --config <cfg>.yamlMulti-GPUs
PYTHONPATH=src torchrun --nproc_per_node=8 -m umm.cli.main eval --config <cfg>.yamlTIPS
我已经在mme, mmbench, mmmu, mathvista, mmvet上适配了,但是没有在生成任务上做适配。此外可能还有一些follow-up PRs:
Backbone LOCAL_RANK adaptation. 需要给所有的backnones/models/apdater.py文件做一下gpu device分配:
并且替换
为