Skip to content

Add versioned eval process protocol#1857

Draft
xeophon wants to merge 8 commits into
mainfrom
codex/eval-process-protocol
Draft

Add versioned eval process protocol#1857
xeophon wants to merge 8 commits into
mainfrom
codex/eval-process-protocol

Conversation

@xeophon

@xeophon xeophon commented Jun 24, 2026

Copy link
Copy Markdown
Member

Overview

Adds a stable process boundary for eval hosts while Verifiers remains the sole owner of typed Pydantic configuration, dynamic taskset/harness resolution, execution, and artifact serialization.

Process contract

  • eval --protocol-version advertises the process and trace schema versions plus supported operations.
  • eval resolve --format json <args> resolves argv without mutating global process state or starting execution.
  • Hosts pass the returned run ID to eval run <args> --uuid <run-id>, so resolution and execution share one in-memory identity.
  • Bare eval invocation remains available for direct CLI use.

Artifacts

Verifiers keeps the existing artifact set: resolved config.toml, append-only Trace results.jsonl, and eval.log. The process protocol does not add a sidecar file.

Resume reloads config.toml, writes into the selected output directory, and derives its local run identity from that directory.

Consumer

Prime CLI integration: PrimeIntellect-ai/prime#760

Comment thread verifiers/v1/cli/eval/main.py Outdated
Comment thread verifiers/v1/cli/eval/resolver.py Outdated
Comment on lines +21 to +27
resume_dir, rest = split_resume(args)
if resume_dir is not None and rest:
raise ValueError("--resume takes no other arguments")
if resume_dir is not None:
config = load_resume_config(resume_dir)
config.uuid = resume_dir.resolve().name
return config

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium eval/resolver.py:21

resolve_eval() calls split_resume() and load_resume_config(), which raise SystemExit on invalid --resume input. A host process embedding this API gets terminated instead of receiving an exception it can catch and handle. Consider wrapping the resume handling in a try/except that re-raises as ValueError or another standard exception.

-    resume_dir, rest = split_resume(args)
-    if resume_dir is not None and rest:
-        raise ValueError("--resume takes no other arguments")
-    if resume_dir is not None:
-        config = load_resume_config(resume_dir)
-        config.uuid = resume_dir.resolve().name
-        return config
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/cli/eval/resolver.py around lines 21-27:

`resolve_eval()` calls `split_resume()` and `load_resume_config()`, which raise `SystemExit` on invalid `--resume` input. A host process embedding this API gets terminated instead of receiving an exception it can catch and handle. Consider wrapping the resume handling in a try/except that re-raises as `ValueError` or another standard exception.

xeophon added 2 commits June 25, 2026 15:17
…o codex/eval-process-protocol

# Conflicts:
#	verifiers/v1/cli/eval/main.py
#	verifiers/v1/cli/eval/resolver.py
@xeophon xeophon changed the title [codex] add versioned eval process protocol Add versioned eval process protocol Jun 25, 2026
Comment thread verifiers/v1/cli/eval/main.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant