Skip to content

feat(benchmarks): add ACEBench support, fix #1025#1386

Open
haoruilee wants to merge 2 commits into
modelscope:mainfrom
haoruilee:codex/acebench-support
Open

feat(benchmarks): add ACEBench support, fix #1025#1386
haoruilee wants to merge 2 commits into
modelscope:mainfrom
haoruilee:codex/acebench-support

Conversation

@haoruilee
Copy link
Copy Markdown
Contributor

This PR adds ACEBench support to EvalScope.

  • Add evalscope/benchmarks/acebench/acebench_adapter.py, inheriting from DefaultDataAdapter.
  • Register the acebench benchmark with normal, special, and agent subsets.

Has ran a real test on my own API to test.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the ACEBench tool-use benchmark, adding documentation, metadata, an adapter, utility functions, and corresponding unit tests. The review feedback focuses on improving the robustness of the adapter and utility implementations by recommending defensive checks against null or missing values, handling potential parsing exceptions (such as invalid JSON or AST parsing errors), ensuring type safety for unexpected inputs, and standardizing boolean string coercion.

Comment thread evalscope/benchmarks/acebench/utils.py Outdated
Comment thread evalscope/benchmarks/acebench/acebench_adapter.py Outdated
Comment thread evalscope/benchmarks/acebench/acebench_adapter.py Outdated
Comment thread evalscope/benchmarks/acebench/utils.py
Comment thread evalscope/benchmarks/acebench/utils.py Outdated
Comment thread evalscope/benchmarks/acebench/utils.py
Comment thread evalscope/benchmarks/acebench/utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant