Benchmark dataset of Autofix Bot against other code/security review tools on the OpenSSF CVE Benchmark.
| Tool | Description |
|---|---|
| Autofix Bot | AI agent for deep code review |
| Claude Code | Anthropic's CLI security review |
| Cursor Bugbot | Cursor's PR review bot |
| CodeRabbit | AI code review platform |
| Semgrep (CE) | Static analysis (Community Edition) |
Final evaluation results in JSONL format with fields:
cve_id: CVE identifiervariant:fixedorunfixeddetected_issues: Issues found by the toolTP,FP,TN,FN: Classification metricsjudge_reasoning: Explanation of the judgment
Intermediate formatted results from each tool, normalized for comparison.
Original tool outputs per CVE, preserving the exact response from each tool.
The archive/ directory contains prompts and data from earlier benchmark runs: