Skip to content

feat(benchmark): add MiniMax as LLM provider for HomeSec-Bench#176

Open
octo-patch wants to merge 1 commit intoSharpAI:developfrom
octo-patch:feature/add-minimax-provider
Open

feat(benchmark): add MiniMax as LLM provider for HomeSec-Bench#176
octo-patch wants to merge 1 commit intoSharpAI:developfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

Add MiniMax Cloud API as a first-class LLM provider for the Home Security AI Benchmark (HomeSec-Bench), enabling users to benchmark MiniMax models (M2.7, M2.7-highspeed, M2.5, M2.5-highspeed) against local Qwen/DeepSeek and other cloud providers like OpenAI.

Changes

  • Provider presets system with auto-configured base URL and default model for MiniMax
  • MiniMax auto-detection via API type, base URL pattern, or MINIMAX_API_KEY env var
  • Temperature clamping [0, 1.0] for MiniMax API compatibility
  • Cloud API recognition enables stream_options for token tracking
  • config.yaml llmProvider selector and minimaxModel dropdown for Aegis UI
  • Documentation updated SKILL.md with providers table, README.md mention
  • 52 tests (49 unit + 3 integration) covering all provider logic

Files Changed (5 files, 538 additions)

File Changes
scripts/run-benchmark.cjs Provider presets, auto-detection, temp clamping
config.yaml llmProvider and minimaxModel parameters
SKILL.md Provider docs, env vars, standalone examples
README.md MiniMax benchmark mention
tests/minimax-provider.test.cjs 49 unit + 3 integration tests

Usage

AEGIS_LLM_API_TYPE=minimax MINIMAX_API_KEY=your-key node scripts/run-benchmark.cjs

MiniMax Models

Model Context Notes
MiniMax-M2.7 1M tokens Latest flagship
MiniMax-M2.7-highspeed 1M tokens Optimized for throughput
MiniMax-M2.5 204K tokens Previous generation
MiniMax-M2.5-highspeed 204K tokens Fast inference

Test Plan

  • All 49 unit tests pass
  • All 3 integration tests pass with live MiniMax API
  • Existing benchmark functionality unchanged
  • Run full HomeSec-Bench suite with MiniMax-M2.7

…Bench

Add MiniMax Cloud API (M2.7, M2.7-highspeed, M2.5, M2.5-highspeed) as a
built-in provider preset for the Home Security AI Benchmark, enabling users
to benchmark MiniMax models against local and other cloud LLMs.

Changes:
- Provider presets system with auto-configured base URL for MiniMax
  (api.minimax.io) and OpenAI
- MiniMax auto-detection via AEGIS_LLM_API_TYPE=minimax or base URL
- Temperature clamping [0, 1.0] for MiniMax API compatibility
- MINIMAX_API_KEY env var as fallback when AEGIS_LLM_API_KEY is not set
- MiniMax recognized as cloud API for stream_options support
- config.yaml: llmProvider selector (builtin/openai/minimax) and
  minimaxModel selector (M2.7, M2.7-highspeed, M2.5, M2.5-highspeed)
- Updated SKILL.md with provider docs, supported providers table,
  standalone usage examples
- README.md: mention MiniMax as benchmark provider option
- 52 tests (49 unit + 3 integration) covering provider resolution,
  model defaults, detection, temperature clamping, config validation
@solderzzc solderzzc changed the base branch from master to develop March 24, 2026 16:59
@solderzzc
Copy link
Copy Markdown
Member

@octo-patch , thanks for your PR, I understand the requirements is to enable MiniMax.
We are now working on swift version mlx-server. https://github.com/SharpAI/mlx-server
Will check the whole pipeline for MiniMax 2.7 integration.

@octo-patch
Copy link
Copy Markdown
Author

Thank you @solderzzc! Sounds good — looking forward to hearing back once you've had a chance to review the full pipeline. Happy to adjust anything on this PR in the meantime.

@solderzzc
Copy link
Copy Markdown
Member

image Just give you a heads up, we are working on more API providers' integration, and testing them. Will have a release soon.

@octo-patch
Copy link
Copy Markdown
Author

Thanks for the update @solderzzc! Great to see the progress on multi-provider API integration. Looking forward to the release — let me know if there's anything I should adjust on this PR to align with the new architecture.

@solderzzc
Copy link
Copy Markdown
Member

Hi, @octo-patch
We are preparing a new release, it has much more API integrated, so require a bit more time for test, following is an update, the integration doesn't need to change the benchmark code since the API will be set with the existing environment variables:

Provider Model Score (Total 96) Accuracy % Avg TTFT (ms) Avg Speed (tok/s)
OpenAI gpt-5.4-2026-03-05 94 97.9% 601 73.4
Anthropic claude-opus-4-20250514 93 96.9% 1336 1.8
OpenAI gpt-5.4-mini-2026-03-17 92 95.8% 553 234.5
Alibaba Cloud qwen3-max 92 95.8% 1170 5.9
Anthropic claude-sonnet-4-20250514 91 94.8% 1223 2.6
MiniMax MiniMax-M2.7-highspeed 90 93.8% 1492 3.0
Moonshot AI kimi-k2-0905-preview 90 93.8% 2200 62.5
Anthropic claude-opus-4-6 90 93.8% 1876 2.3
OpenAI gpt-5.4-nano-2026-03-17 89 92.7% 508 136.4
MiniMax MiniMax-M2.5-highspeed 89 92.7% 1792 2.8
Alibaba Cloud qwen-plus 89 92.7% 535 11.7
Anthropic claude-haiku-4-5 89 92.7% 530 5.3
Anthropic claude-sonnet-4-6 89 92.7% 1368 2.6
Other Cloud grok-4-1-fast-non-reasoning 89 92.7% 447 496.1
DeepSeek deepseek-chat 88 91.7% 1481 21.4
MiniMax MiniMax-M2.7 88 91.7% 3791 1.5
MiniMax MiniMax-M2.5 86 89.6% 3230 199.1
Alibaba Cloud qwen-flash 86 89.6% 398 28.5
Alibaba Cloud qwen3.5-plus 82 85.4% 1255 16.3
Alibaba Cloud qwen3.5-flash 81 84.4% 804 26.7
MiniMax MiniMax-M2.1 77 80.2% 4530 195.7
Moonshot AI kimi-k2-thinking 70 72.9% 1119 28.3
Moonshot AI kimi-k2-turbo-preview 66 68.8% 773 134.9
DeepSeek deepseek-reasoner 63 65.6% 1137 34.5
OpenAI gpt-5-mini-2025-08-07 60 62.5% 7248 72.6
Moonshot AI kimi-k2.5 58 60.4% 836 52.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants