chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes by mabry1985 · Pull Request #59 · protoLabsAI/roxy

mabry1985 · 2026-06-06T09:18:34Z

Syncs roxy to protoAgent v0.16.0 (was 20 behind → 0). Merge commit, not squash. Parents 4a769c4 + 4afd97d.

Cleanest sync yet — only pyproject.toml (version) conflicted (kept roxy's 0.15.0; the release bump follows). The seam adoption (#57) + CHANGELOG merge strategy did their job: server/a2a.py/chat.py stay zero-delta, README/docs/changelog auto-merged, and test_a2a_handler.py kept roxy's confidence assertion with no conflict.

Notable in this batch

roxy's own upstream fixes land in roxy: fix(mcp) #595 (tool error degrades, was roxy#58) + feat(a2a) #598 loopback-card guard (was roxy#31). The guard activates roxy's pre-set a2a.require_routable_url: true — verified live: [a2a] card URL http://roxy:7870/a2a is routable (check passed).
Infra/security: #581 default-bind 127.0.0.1 (containers expose 0.0.0.0), #591 bearer-gate console + OpenAI-compat APIs, #582 LLM timeout/retries, #583 default-on SSRF denylist, #584 HEALTHCHECK/probes, #580 launch the server/ package.
New: ADR-0024 code_with CLI coding agents over ACP (#596/#599/#600).

Verified live on ava

Healthy, v0.16.0 base, clean boot, loopback guard passes, console gate leaves healthz/card open (both 200), 965 venv tests pass (only the known protolabs_a2a-not-in-venv card tests skip locally; CI runs them). A2A smoke green: chat + fleet sees 6 + per-project memory PASS. Base preserved (0 behind upstream).

Merge with --merge (not squash). A roxy release follows.

🤖 Generated with Claude Code

…-#572) (#579) Sweep the docs + README for staleness left by the ADR-0023 server/ decomposition and the #570-#572 operator-fork seams: - New config-reference sections: a2a (skills + description, #570) and security (callback_allowlist, #572); plugins guide gains register_a2a_skill + register_thread_id_resolver. - agent-card reference + add-a-skill guide + TEMPLATE.md rewritten to the config-driven card (declare a2a.skills / register_a2a_skill — don't edit server/a2a.py). Fixed the add-a-skill test example (was importing the gone _SKILL_SPECS). - server.py -> server/ package across every living guide/reference (module paths to server/agent_init.py / server/chat.py / server/a2a.py); fixed the BROKEN 'python server.py' launch command in TEMPLATE.md -> 'python -m server'. - a2a_handler.py (deleted in the A2A 1.0 migration) re-pointed to its real homes: a2a_auth.py (bearer/token), a2a_stores.py (webhook SSRF), a2a_executor.py (worldstate-delta), server/chat.py (caller-trace); diagram boxes relabeled to the conceptual 'A2A handler'. Docs build green; no code touched. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… CI guard) (#580) ADR 0023 promoted server.py → the server/ package (launch: python -m server), but the k8s manifest and the OpenShell sandbox-create script still ran 'python server.py' — which exits immediately ('can't open file server.py'), i.e. CrashLoopBackOff on k8s. PYTHONPATH=/opt/protoagent is baked as an image ENV so it survives the command override; switch both to 'python -m server'. Add a CI guard (checks.yml) that fails on any 'python server.py' invocation in scripts/manifests/Dockerfiles so this can't regress. Prod-readiness audit P0 (batch A).

…y (#581) Prod-readiness audit P0 (network-boundary half). The server hardcoded host=0.0.0.0, so a local/desktop run exposed the operator/console + OpenAI-compat API (/api/*, /api/chat, /v1/*) — which are NOT auth-gated — to anything that could reach the port. - New --host arg / PROTOAGENT_HOST env, defaulting to 127.0.0.1 (loopback). Local + desktop-sidecar runs are now loopback-only by default. - Containers bind 0.0.0.0 explicitly: entrypoint.sh passes --host 0.0.0.0, and the k8s + OpenShell command-overrides (which bypass the entrypoint) pass it too. Their boundary is the published port + network policy, not the in-container bind. - Loud startup WARNING when binding non-loopback with no A2A auth token configured. Verified: boot smoke binds 127.0.0.1 by default (Uvicorn running on http://127.0.0.1); 1047 tests green. The credential-boundary half (bearer-gate /api/* + /v1/*, which needs console-token plumbing) follows in a separate PR. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…client (#582) Prod-readiness audit P1 (resilience). - The model client had NO timeout — a hung/slow LiteLLM gateway blocked the turn (and the A2A task / SSE stream) indefinitely; under load these pile up. Add model.request_timeout (default 120s) + model.max_retries (default 2), passed as timeout=/max_retries= to ChatOpenAI in graph/llm.py. - _a2a_push_client (httpx.AsyncClient) was created but never closed — connection pool leak on shutdown/reload (bites the desktop-sidecar restart loop). Close it best-effort in the shutdown hook. Verified: 1047+ tests green (new test_llm asserts the bounds); boot + clean SIGTERM shutdown smoke shows no push-client errors. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… (#583) Prod-readiness audit P1. egress.check_url was a host allowlist that was permissive (off) by default — so with no allowlist set, the model could fetch_url an internal service or http://169.254.169.254/ (cloud metadata). And fetch_url used follow_redirects=True, so a public URL 30x-redirecting to an internal host bypassed the check entirely. - egress.check_url now applies a default-on private-IP denylist when no allowlist is set: resolve the host, block loopback/link-local/private/multicast/reserved/ metadata (+ unresolvable). Public hosts still work with no config. An allowlist, when set, is the explicit-trust path and bypasses the denylist (mirrors a2a_stores.is_safe_webhook_url). - fetch_url disables auto-redirects and follows manually, re-checking each hop's host against egress (max 5 hops; refuses non-http(s) redirects). Verified: test_egress updated for the new default (public allowed, private/metadata blocked, allowlist bypass); 1047+ tests green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…584) Prod-readiness audit P1. The app exposes a correct readiness endpoint (/healthz returns 200 only when STATE.graph is compiled, 503 during cold start) but nothing used it — compose restart:unless-stopped couldn't detect a hung-but-alive process, and k8s wouldn't gate traffic on readiness (requests hit the pod during the model-cold-start window). - Dockerfile HEALTHCHECK (curl /healthz; start-period 60s for first-compile). - docker-compose healthcheck block. - k8s readinessProbe (gates traffic) + livenessProbe (generous: initialDelay 90s + failureThreshold 6 so a slow first compile can't trigger a restart loop). curl is already in the image. Both YAML files parse. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… (#585) The "Attest build provenance" step calls actions/attest-build-provenance, which needs the GitHub attestations feature — unavailable on private repos without a paid plan. A fork that cuts a release therefore gets a noisy red error on a step it can't use (the step is already continue-on-error, so the release still publishes, but the failed annotation is misleading). Gate it behind an opt-in `ATTESTATIONS_ENABLED` repo variable, mirroring the existing `RELEASE_ENABLED` guard: forks leave it unset and the step is skipped cleanly; repos with the feature set the variable to `true` — no workflow edit, so it doesn't conflict when forks re-sync from upstream. continue-on-error is kept as a belt-and-suspenders. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…a turn (#586) Prod-readiness audit P1 (the durable fix the memory note keeps flagging). Every wire bug this repo shipped (CRLF SSE #563, A2A 404s / eval rot #524, lean-image FastAPI gap #426) was unit-green but wire-broken because nothing booted the real server and exercised the actual transport. - scripts/fake_openai_server.py — a zero-dep OpenAI-compat endpoint that streams a canned completion, so a real A2A turn reaches a terminal state without a gateway. - scripts/live_smoke.py — boots `python -m server --ui none` against the fake model, waits /healthz, fetches the agent card, POSTs a real SendStreamingMessage turn, and asserts the SSE frames decode + reach a terminal frame. - checks.yml: new 'A2A live smoke (lean tier)' job that installs requirements-core.txt (the production image's deps — also guards the lean-import gap class) and runs it. Verified locally: lean server boots, card serves, streaming turn completes end-to-end against the fake model. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Prod-readiness audit P1 — no static analysis was enforced (no ruff/mypy in pyproject.toml or CI), and the untyped wire boundaries are where the recent CRLF/tool-call bugs lived. - pyproject.toml [tool.ruff]: select E/F/W; ignore the stylistic rules the codebase intentionally uses (E402 lazy imports, E501 long comment lines, E702/E731 semicolon/lambda style); per-file-ignore F401 in **/__init__.py (the re-export surfaces, esp. server/__init__.py). - Cleaned the real signals it surfaced: 45 unused imports auto-removed across the tree + 4 dead assignments (incl. a stale old_config the Discord-reconnect detection never used). One F841 false-positive (parenthesized multi-with) noqa'd with a note. - checks.yml: new 'Lint (ruff)' job, ruff pinned so a release can't fail the gate. ruff check . is clean; 1049 tests green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…ction (#588) Prod-readiness audit P1/P2 (observability + data integrity). audit.py: - Instance-scope the path via paths.scope_leaf (ADR 0004) — two instances on a shared FS no longer interleave records into one file. Path resolves LAZILY so it picks up PROTOAGENT_INSTANCE (seeded after import). - Rotate at a 50MB cap (keep one .1 backup) — the log can't fill the disk. - get_recent reads only the last 512KB (tail), not the whole file → no OOM on a large log. - Cap _session_stats (OrderedDict, evict oldest) — no slow memory leak. redaction.py: - Add provider token shapes that a tool error could echo into the audit log: Discord bot, Google OAuth (ya29.) + API key (AIza), GitHub (gh?_), Slack (xox?-), AWS (AKIA), client_secret; + DISCORD_BOT_TOKEN/GATEWAY_API_KEY/GH_PAT/refresh_token to the env-key + assignment lists. Wired into _redact_string_simple. Verified: benign text untouched (no over-redaction); 1054 tests green; ruff clean. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… (#589) Prod-readiness audit P2 (data integrity). These four sqlite stores opened connections with no PRAGMAs, unlike the checkpointer + knowledge store — so under concurrent access (scheduler tick + console read + agent write) they threw 'database is locked' instead of waiting, and got no WAL concurrency. Add journal_mode=WAL + busy_timeout=5000 to each connect (matching graph/checkpointer). busy_timeout especially helps beads' single shared connection. Verified: PRAGMAs apply (journal_mode=wal, busy_timeout=5000); 65 store tests green; ruff clean. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…(#591) Prod-readiness audit P0 (credential boundary — completes the console-auth fix after the localhost-default bind). The A2A guard only covered /a2a, so /api/* (run subagents, rewrite config/SOUL, schedule jobs), /api/chat, and /v1/* were unauthenticated even when an A2A token was configured. - a2a_auth: guard /a2a + /api/ + /v1/ (engages only when a token is set — the no-token default stays open so the local console works). Exempt the read-only /api/events SSE stream (EventSource can't send a bearer; it exposes no action). /healthz, /.well-known, /metrics, static /app are outside the prefixes → public. - console (api.ts): send Authorization: Bearer from localStorage protoagent.authToken on every fetch-based API + A2A call (blank ⇒ none), so a token-protected deployment's console authenticates. Default (no token) unchanged. - docs: env-vars auth section documents the scope + the console token. Verified: 11 auth tests (incl. /api+/v1 guarded when token set, /api/events + /healthz public, all-open when no token); web build + chat e2e green; LIVE-smoked a booted server with A2A_AUTH_TOKEN — /api/runtime/status 401 without / 200 with, /a2a 401 without, /healthz 200. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Prod-readiness audit P2. metrics.py covered LLM/tool/cost/sessions but had no A2A-turn signal — so 'turns are failing' or 'turns backing up' couldn't be alerted from /metrics (the richest signal lived only in the telemetry SQL store). - metrics.py: a2a_turns_total{state} counter (completed/failed/canceled — low cardinality) + a2a_turn_seconds latency histogram + record_a2a_turn(state, dur). - server/a2a.py: emit it from _record_a2a_telemetry, independent of the SQL store, best-effort so it never affects a turn. - tests/test_metrics.py (the audit flagged none existed). 1059 tests green; ruff clean. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…urs (closes #590) (#593) Two footguns bit every fork sync this cycle (gina/jon/protoTrader): 1. Squash-merging an upstream-sync PR breaks the fork's merge base — the behind count stays inflated and every later sync re-conflicts on integrated code. 2. The inherited CHANGELOG merge=union splices upstream's whole changelog into the fork on each sync. Promote the roxy-sync dev note into a fork-agnostic guide (docs/guides/upstream-sync.md): the procedure (fetch both, branch off origin/main, real merge), the merge-NOT-squash rule (+ the protoTrader before/after evidence), the CHANGELOG merge=ours fork switch (+ the merge.ours.driver caveat), and the now-tiny conflict surface (the operator-fork contract means edits → file a seam). Register it in the sidebar; add a FORKS note to the template's .gitattributes pointing at it (template keeps merge=union for its internal flow); mark the old roxy note superseded. Docs build green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…4 (#596) Bring ORBIS's ACP-client pattern into protoAgent so the lead agent can hand a real, repo-scoped coding job to a purpose-built CLI coding agent (protoCLI `proto`, Claude Code, Codex, Gemini CLI) and get the result back. New opt-in `coding_agent` plugin adds one tool, `code_with(agent, task)`, backed by an ACP client (port of ORBIS's acp/client.py): launch the agent as a subprocess, drive one session over JSON-RPC 2.0 on its stdio (initialize → session/new → session/prompt), accumulate agent_message_chunk as the answer, auto-allow session/request_permission. One client (subprocess + session) cached per agent so follow-up calls continue the thread; a per-agent lock serializes turns. Security posture (PR1, ADR 0024): ships DISABLED with an empty agent list. Each agent's workdir is config-pinned (the tool takes only agent+task, never a path) and auto-allowed — the coding agent self-governs within its sandbox dir. HITL gating + live A2A narration land in later PRs. - plugins/coding_agent/{__init__.py,acp_client.py,protoagent.plugin.yaml} - tests/test_coding_agent_plugin.py — config normalization, tool wiring, and a real ACP wire exchange against a fake agent subprocess (9 cases) - ADR 0024 + index; docs/guides/coding-agents.md + sidebar; plugins guide note; CHANGELOG [Unreleased] Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… — ADR 0024 (PR3) (#599) Adds safety controls to the coding_agent plugin (#596 shipped the base): - Per-agent by-kind permission policy applied to the coding agent's session/request_permission requests (keyed on toolCall.kind): * auto (allow all — default, unchanged from PR1) * allowlist (allow all but execute/delete) * readonly (allow only read-like kinds) Overridable with allow_kinds / deny_kinds. AcpClient gains a pluggable permission resolver (default stays auto-allow, so PR1 behaviour is unchanged). - Per-call consent gate: `confirm: true` makes code_with ask the operator via ask_human before each call (runs before any side effect, so LangGraph resume re-execution is idempotent). code_with drops @with_fallback so the interrupt control-flow exception propagates; the I/O is guarded with a local net. - Agent recipes for protoCLI, Claude Code, Codex, Gemini CLI (docs + manifest). Per-action live HITL is documented as deferred: pausing a blocking subprocess session mid-turn is incompatible with interrupt()'s checkpoint/re-run model (same coupling that blocks live narration). readonly/allowlist give deterministic per-action control; confirm gives a per-call human gate. - 17 new/updated tests incl. a wire test that a readonly policy rejects the fake agent's `edit` permission request (36 pass total). - ADR 0024 scope + posture updated; guide permission section rewritten; CHANGELOG. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

…R 0024 (PR4) (#600) Completes the ADR 0024 PR plan with the coding-agent eval, plus the runner mechanism it needs: - eval runner: a case may declare `requires_env: [VAR, …]`; when any is unset the case is SKIPPED (new CaseResult.skipped; shown SKIP, excluded from the pass/fail tally and from the non-zero exit) instead of run. Generally useful — any case needing an optional integration can gate on it without breaking the default board. - tasks.json: `code_with_delegation` (kind=ask, gated on EVAL_CODING_AGENT) — drives a live A2A turn that asks the agent to use code_with and asserts (audit channel) the tool fired. Skips by default. - tests: _requirements_unmet, board counts skipped separately, case present + gated (test_eval_coverage, 22 pass; full eval suite 36 pass). - docs: evals guide (requires_env), coding-agents guide (Eval it), ADR 0024 scope, CHANGELOG. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

… turn (#594) (#595) MCP tools were appended in build_mcp_tools without handle_tool_error, so when langchain-mcp-adapters raises ToolException on an isError result (e.g. a server 404 from a stale id), it propagates out of the ToolNode and fails the whole A2A turn. A single recoverable tool error (stale arg, transient 4xx) shouldn't kill an otherwise-fine turn. Set handle_tool_error on each kept MCP tool so the exception is caught inside BaseTool.arun and returned to the model as a tool result it can act on (retry, skip, or adapt). Every fork that configures MCP servers inherits this. Tests: ToolException degrades via handle_tool_error; build_mcp_tools wires the handler onto kept tools; handler returns a non-fatal, actionable message. Closes #594. Co-authored-by: Josh (via Claude) <josh@protolabs.studio> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ard URL (#598) * feat(a2a): opt-in guard — refuse to start on a loopback card URL (protoLabsAI/protoAgent#597) A deployed agent that advertises http://127.0.0.1:.../a2a (e.g. A2A_PUBLIC_URL unset after a redeploy) is silently unreachable to remote consumers — a config regression no test catches, surfacing only at first cross-host dispatch. Add a2a.require_routable_url (default false). When set, assert_routable_card_url() runs at startup: if the resolved card URL host is loopback (127.0.0.1/localhost/ ::1/0.0.0.0) it logs a clear error pointing at A2A_PUBLIC_URL and exits non-zero. Off by default so local + desktop runs still advertise loopback correctly. Tests cover each loopback form, off-by-default, and the routable-passes case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(lint): drop unused `import types` in test_routable_card_url The ruff gate (ruff==0.15.10) flagged F401 — `types` is imported but never used. Removes it so the Lint check passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Josh (via Claude) <josh@protolabs.studio> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

…-v0160 # Conflicts: # pyproject.toml

protoquinn · 2026-06-06T09:18:36Z

👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow.

protoquinn

QA Audit — PR #59 | chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes

VERDICT: WARN (CI not yet terminal — formal PASS/FAIL follows on next pass)

CI Status

Web E2E smoke: queued
Verify workspace config: queued
A2A live smoke (lean tier): queued
Python tests: queued
Lint (ruff): queued

Diff Review
Sync to protoAgent v0.16.0 via merge commit (4a769c4 + 4afd97d). Notable changes:

.github/workflows/checks.yml: Two new jobs — ruff lint (pinned 0.15.10) and a lean-tier A2A live smoke using scripts/live_smoke.py + requirements-core.txt. Both are additive; no regressions to existing jobs.
.github/workflows/release.yml: Fork-friendly ATTESTATIONS_ENABLED gate on provenance attestations step — upstream made this conditional so forks don't break on the unavailable feature. Correct.
CHANGELOG.md: merge=union on both sides produces a merged changelog. Per PR description roxy's own entries are preserved; verifiable at merge time.
Dockerfile: Adds HEALTHCHECK --interval=30s --start-period=60s hitting /healthz. Standard practice; does not interfere with existing CMD.
.gitattributes: Adds fork guidance comment for CHANGELOG.md merge=ours switch. Documentation only.
README.md: Architecture table updated to reflect server/ package layout (ADR 0023). Clean.

Observations

[MEDIUM] scripts/live_smoke.py is referenced in the new A2A live smoke job but its existence was not verifiable by this reviewer — the diff_summary is truncated at ~4400 lines and the file may be beyond the preview window. Flagging as a Gap: scripts/live_smoke.py should be confirmed to exist in the repo before the job runs, or the CI job will fail at execution. If it exists in the merge commit, the job is correctly scoped and this Gap closes.
[INFO] No CodeRabbit threads. No unresolved review comments.
[INFO] pyproject.toml version conflict resolved: roxy keeps 0.15.0 (the release bump follows post-merge). No functional change to the lockfile in this PR.
[INFO] The new ruff job is pinned to a specific version (0.15.10) to prevent an uncontrolled rule addition from breaking the gate — this is a good practice worth noting.

Checks needed for formal PASS:

All 5 CI jobs must reach terminal (pass or gracefully skipped).
scripts/live_smoke.py confirmed present in the merge commit tree.
Python test suite at 965 passing (per PR author's claim).

— Quinn, QA Engineer

protoquinn · 2026-06-06T09:19:03Z

Submitted COMMENT review on #59.

Status: CI is still queued (5 checks in flight). The PR is structurally clean — merge commit, no conflicts, fork-friendly attestations gate, new jobs properly scoped. One Gap flagged: scripts/live_smoke.py existence unverified in the truncated diff. Once all checks reach terminal state, a re-review will determine whether to APPROVE or REQUEST_CHANGES.

protoquinn

CI terminal-green, no blockers on prior review — auto-approving on green (#748).

mabry1985 and others added 21 commits June 6, 2026 03:42

chore: release v0.16.0 (#601)

4afd97d

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Merge remote-tracking branch 'upstream/main' into chore/sync-upstream…

42a551e

…-v0160 # Conflicts: # pyproject.toml

protoquinn Bot reviewed Jun 6, 2026

View reviewed changes

protoquinn Bot approved these changes Jun 6, 2026

View reviewed changes

mabry1985 merged commit 9456d6b into main Jun 6, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes#59

chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes#59
mabry1985 merged 21 commits into
mainfrom
chore/sync-upstream-v0160

mabry1985 commented Jun 6, 2026

Uh oh!

protoquinn Bot commented Jun 6, 2026

Uh oh!

protoquinn Bot left a comment

Uh oh!

protoquinn Bot commented Jun 6, 2026

Uh oh!

protoquinn Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mabry1985 commented Jun 6, 2026

Notable in this batch

Verified live on ava

Uh oh!

protoquinn Bot commented Jun 6, 2026

Uh oh!

protoquinn Bot left a comment

Choose a reason for hiding this comment

Uh oh!

protoquinn Bot commented Jun 6, 2026

Uh oh!

protoquinn Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant