Skip to content

Add Exgentic Benchmarks MCP Server#178

Draft
yoavkatz wants to merge 8 commits intokagenti:mainfrom
yoavkatz:feature/exgentic-mcp-server
Draft

Add Exgentic Benchmarks MCP Server#178
yoavkatz wants to merge 8 commits intokagenti:mainfrom
yoavkatz:feature/exgentic-mcp-server

Conversation

@yoavkatz
Copy link

@yoavkatz yoavkatz commented Mar 16, 2026

Summary

This PR adds a Docker-based MCP server for Exgentic benchmarks.

For: kagenti/kagenti#963

as part of Epic: kagenti/kagenti#962

Changes

  • Created directory with:
    • Dockerfile: Multi-stage build using uv for fast package installation
    • entrypoint.sh: Runtime script to start the MCP server
    • README.md: Comprehensive documentation with examples
    • .dockerignore: Build optimization

Features

  • ✅ Supports building images with specific benchmarks (e.g., tau2, webarena, miniwob)
  • ✅ Uses uv for fast package installation
  • ✅ Includes git configuration to handle large repository clones (HTTP/1.1, increased buffers)
  • ✅ Benchmark is installed at build time for faster startup
  • ✅ Configurable HOST/PORT via environment variables
  • ✅ Runs as non-root user (UID 1001)
  • ✅ Successfully tested with tau2 benchmark (114 tasks loaded)

Testing

Built and tested the Docker image:

  • Image size: 2.43 GB
  • Successfully loaded tau2 benchmark with 114 tasks
  • MCP server started and responded correctly to HTTP requests
  • 3 management tools available (list_tasks, create_session, delete_session)

Usage

# Build for tau2 benchmark
cd mcp/exgentic_benchmarks
podman build --build-arg BENCHMARK_NAME=tau2 -t exgentic-mcp-tau2 .

# Run the server
podman run -p 8000:8000 exgentic-mcp-tau2

Documentation

See mcp/exgentic_benchmarks/README.md for complete documentation including:

  • Quick start guide
  • Build instructions for different benchmarks
  • Configuration options
  • Troubleshooting guide
  • Advanced usage examples

- Created Docker-based MCP server for Exgentic benchmarks
- Supports building images with specific benchmarks (e.g., tau2)
- Uses uv for fast package installation
- Includes git configuration to handle large repository clones
- Benchmark is installed at build time for faster startup
- Configurable HOST/PORT via environment variables
- Runs as non-root user (UID 1001)
- Comprehensive documentation with examples
- Successfully tested with tau2 benchmark (114 tasks loaded)

Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz yoavkatz requested a review from a team as a code owner March 16, 2026 14:22
@yoavkatz yoavkatz marked this pull request as draft March 16, 2026 14:23
@pdettori pdettori requested a review from kellyaa March 16, 2026 14:49
- Modified entrypoint.sh to support EXGENTIC_SET_* environment variables
- Environment variables are converted to --set arguments for exgentic mcp command
- Format: EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL -> --set benchmark.user_simulator_model
- Updated README with detailed documentation and examples
- Added common parameter list and usage examples
- Supports setting user_simulator_model, agent_model, max_steps, and other benchmark parameters

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Copy link
Contributor

@esnible esnible left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add, either to the README, or to this PR, the rationale for adding this to the Kagenti examples? Have you been testing this with Kagenti -- the README only talks about testing with Docker. The instructions talk about different builds but Kagenti builds only once -- do we need those instructions?

- Updated Dockerfile permissions for Kubernetes compatibility (group permissions)
- Changed ownership from 1001:1001 to 1001:0 for OpenShift/Kubernetes
- Added group read/write/execute permissions (g+rwX)
- Added --disable-dns-rebinding-protection flag to allow Kubernetes service access
- This fixes the 'Invalid Host header' error when accessing from within cluster

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
@yoavkatz
Copy link
Author

Can you add, either to the README, or to this PR, the rationale for adding this to the Kagenti examples? Have you been testing this with Kagenti -- the README only talks about testing with Docker. The instructions talk about different builds but Kagenti builds only once -- do we need those instructions?

This PR is related to work. It extends the work done by @kellyaa to add appworld mcp server to encompass in a systematic way multiple benchmarks supported by the Exgentic framework.

I added now the references to the related Kagenti issue (kagenti/kagenti#963) and Epic (kagenti/kagenti#962) for clearer context.

Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Added ENV HOME=/app after USER 1001 to ensure proper home directory
- Deleted .env.openai (replaced with benchmark-specific env files)
- This ensures tools/libraries write to /app where user has permissions

Signed-off-by: Yoav Katz <katz@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants