Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration

Official implementation for "Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration" accepted by ICLR 2026.

Authors: Sukwon Yun, Jie Peng, Pingzhi Li, Wendong Fan, Jie Chen, James Zou, Guohao Li, and Tianlong Chen

Graph-of-Agents (GoA) Overview

A test-time inference framework that dynamically selects, evaluates, and orchestrates multiple specialized language models as a collaborative graph to solve diverse tasks.

1. Environment Setup

conda create -n goa python=3.10 -y
conda activate goa
pip install -r requirements.txt

2. Serving Models with vLLM

Each model runs as a separate vLLM server on its own GPU. Launch each in a separate terminal (or use screen/tmux). We recommend serving each model on a different GPU to handle calls efficiently during multiprocessing (we used 6 × A6000 GPUs for experiments with an agent pool of six models):

CUDA_VISIBLE_DEVICES=0 vllm serve Qwen/Qwen2.5-7B-Instruct --port 8000
CUDA_VISIBLE_DEVICES=1 vllm serve Qwen/Qwen2.5-Coder-7B-Instruct --port 8001
CUDA_VISIBLE_DEVICES=2 vllm serve mistralai/Mathstral-7B-v0.1 --port 8002
CUDA_VISIBLE_DEVICES=3 vllm serve ContactDoctor/Bio-Medical-Llama-3-8B --port 8003
CUDA_VISIBLE_DEVICES=4 vllm serve instruction-pretrain/finance-Llama3-8B --port 8004
CUDA_VISIBLE_DEVICES=5 vllm serve Equall/Saul-7B-Instruct-v1 --port 8005

Verify a server is running:

curl http://localhost:8000/v1/models

The model endpoints are configured in endpoint.py. Update the URLs and ports there if your setup differs.

3. Running GoA

Dev run (small sample for quick testing):

python main.py \
    --data GPQA \
    --eval dev \
    --reference_models qwen,qwen_coder,mathstral,biomedical_llama,finance_llama,saul \
    --meta_llm qwen \
    --graph_pooling_method max \
    --top_k 3 \
    --seed 0

Full evaluation:

python main.py \
    --data GPQA \
    --eval test \
    --reference_models qwen,qwen_coder,mathstral,biomedical_llama,finance_llama,saul \
    --meta_llm qwen \
    --graph_pooling_method weighted_mean \
    --top_k 3 \
    --seed 0

Arguments:

Argument	Description	Default
`--data`	Dataset: `GPQA`, `MMLU`, `MMLU_Pro`, `MATH`, `AIME24`, `MedMCQA`, `human_eval`	`GPQA`
`--eval`	`dev` (small sample) or `test` (full evaluation)	`test`
`--reference_models`	Comma-separated model keys from `endpoint.py`	`qwen,qwen_coder,...`
`--meta_llm`	General-purpose model used for node sampling and graph pooling	`qwen`
`--graph_pooling_method`	`max`, or `mean`	`mean`
`--top_k`	Number of models to select per question	`3`
`--threshold`	Minimum edge score to keep a model in the graph	`0.05`
`--rounds`	Number of message-passing rounds	`1`
`--temperature`	Sampling temperature	`0.7`
`--max_tokens`	Max tokens per generation	`800`
`--num_proc`	Number of parallel workers	`1`
`--seed`	Random seed	`0`

Results are saved to outputs/{data}/{eval}/.

4. Adding New Model

Step 1: Generate a model card

Use generate_model_card.py to automatically extract model information from HuggingFace:

python generate_model_card.py \
    --model_id "meta-llama/Meta-Llama-3-8B-Instruct" \
    --name "llama3" \
    --url "http://localhost:8006/v1/completions" \
    --domain "general" \
    --llm_model "Qwen/Qwen2.5-7B-Instruct" \
    --llm_endpoint "http://localhost:8000/v1/completions"

This prints a ready-to-paste dictionary entry.

Step 2: Add to endpoint.py

Copy the generated entry into endpoint.py:

model_endpoint_dict = {
    # ... existing models ...

    "llama3": {
        "url": "http://localhost:8006/v1/completions",
        "model_id": "meta-llama/Meta-Llama-3-8B-Instruct",
        "max_tokens": 4096,
        "domain": "general",
        "model_card": "- **Domain**: General-purpose\n- **Task Specialization**: ..."
    }
}

Step 3: Serve and run

# Serve the new model
CUDA_VISIBLE_DEVICES=6 vllm serve meta-llama/Meta-Llama-3-8B-Instruct --port 8006

# Include it in the agent pool
python main.py \
    --data GPQA \
    --eval test \
    --reference_models qwen,qwen_coder,mathstral,biomedical_llama,finance_llama,saul,llama3 \
    --top_k 3

Project Structure

GoA_final/
├── main.py                  # Clean evaluation script
├── modules.py               # Core GoA pipeline (prompts + graph operations)
├── utils.py                 # Utilities (LLM calls, parsing, evaluation)
├── endpoint.py              # Model endpoint configurations
├── generate_model_card.py   # Tool to generate model cards for new models
├── run.sh                   # Example run script
├── requirements.txt         # Python dependencies
└── data/
    ├── dev/                 # Small dev samples for testing
    └── test/                # Full test sets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration

Graph-of-Agents (GoA) Overview

1. Environment Setup

2. Serving Models with vLLM

3. Running GoA

4. Adding New Model

Step 1: Generate a model card

Step 2: Add to endpoint.py

Step 3: Serve and run

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
README.md		README.md
endpoint.py		endpoint.py
generate_model_card.py		generate_model_card.py
main.py		main.py
modules.py		modules.py
requirements.txt		requirements.txt
run.sh		run.sh
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Graph-of-Agents: A Graph-based Framework for Multi-Agent LLM Collaboration

Graph-of-Agents (GoA) Overview

1. Environment Setup

2. Serving Models with vLLM

3. Running GoA

4. Adding New Model

Step 1: Generate a model card

Step 2: Add to endpoint.py

Step 3: Serve and run

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages