Autofix Bot Bench

Benchmark dataset of Autofix Bot against other code/security review tools on the OpenSSF CVE Benchmark.

Benchmarked Tools

Tool	Description
Autofix Bot	AI agent for deep code review
Claude Code	Anthropic's CLI security review
Cursor Bugbot	Cursor's PR review bot
CodeRabbit	AI code review platform
Semgrep (CE)	Static analysis (Community Edition)

Data Format

Judged Results (`benchmarks/judged-results/`)

Final evaluation results in JSONL format with fields:

cve_id: CVE identifier
variant: fixed or unfixed
detected_issues: Issues found by the tool
TP, FP, TN, FN: Classification metrics
judge_reasoning: Explanation of the judgment

Processed Results (`benchmarks/processed/`)

Intermediate formatted results from each tool, normalized for comparison.

Raw Output (`benchmarks/raw-output/`)

Original tool outputs per CVE, preserving the exact response from each tool.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
archive/sep-2025		archive/sep-2025
benchmarks		benchmarks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autofix Bot Bench

Benchmarked Tools

Data Format

Judged Results (`benchmarks/judged-results/`)

Processed Results (`benchmarks/processed/`)

Raw Output (`benchmarks/raw-output/`)

Archive

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

DeepSourceCorp/autofix-bot-bench

Folders and files

Latest commit

History

Repository files navigation

Autofix Bot Bench

Benchmarked Tools

Data Format

Judged Results (benchmarks/judged-results/)

Processed Results (benchmarks/processed/)

Raw Output (benchmarks/raw-output/)

Archive

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Judged Results (`benchmarks/judged-results/`)

Processed Results (`benchmarks/processed/`)

Raw Output (`benchmarks/raw-output/`)

Packages