Skip to content

chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes#59

Merged
mabry1985 merged 21 commits into
mainfrom
chore/sync-upstream-v0160
Jun 6, 2026
Merged

chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes#59
mabry1985 merged 21 commits into
mainfrom
chore/sync-upstream-v0160

Conversation

@mabry1985
Copy link
Copy Markdown
Contributor

Syncs roxy to protoAgent v0.16.0 (was 20 behind → 0). Merge commit, not squash. Parents 4a769c4 + 4afd97d.

Cleanest sync yet — only pyproject.toml (version) conflicted (kept roxy's 0.15.0; the release bump follows). The seam adoption (#57) + CHANGELOG merge strategy did their job: server/a2a.py/chat.py stay zero-delta, README/docs/changelog auto-merged, and test_a2a_handler.py kept roxy's confidence assertion with no conflict.

Notable in this batch

  • roxy's own upstream fixes land in roxy: fix(mcp) #595 (tool error degrades, was roxy#58) + feat(a2a) #598 loopback-card guard (was roxy#31). The guard activates roxy's pre-set a2a.require_routable_url: true — verified live: [a2a] card URL http://roxy:7870/a2a is routable (check passed).
  • Infra/security: #581 default-bind 127.0.0.1 (containers expose 0.0.0.0), #591 bearer-gate console + OpenAI-compat APIs, #582 LLM timeout/retries, #583 default-on SSRF denylist, #584 HEALTHCHECK/probes, #580 launch the server/ package.
  • New: ADR-0024 code_with CLI coding agents over ACP (#596/#599/#600).

Verified live on ava

Healthy, v0.16.0 base, clean boot, loopback guard passes, console gate leaves healthz/card open (both 200), 965 venv tests pass (only the known protolabs_a2a-not-in-venv card tests skip locally; CI runs them). A2A smoke green: chat + fleet sees 6 + per-project memory PASS. Base preserved (0 behind upstream).

Merge with --merge (not squash). A roxy release follows.

🤖 Generated with Claude Code

mabry1985 and others added 21 commits June 6, 2026 03:42
…-#572) (#579)

Sweep the docs + README for staleness left by the ADR-0023 server/ decomposition
and the #570-#572 operator-fork seams:

- New config-reference sections: a2a (skills + description, #570) and security
  (callback_allowlist, #572); plugins guide gains register_a2a_skill +
  register_thread_id_resolver.
- agent-card reference + add-a-skill guide + TEMPLATE.md rewritten to the
  config-driven card (declare a2a.skills / register_a2a_skill — don't edit
  server/a2a.py). Fixed the add-a-skill test example (was importing the gone
  _SKILL_SPECS).
- server.py -> server/ package across every living guide/reference (module paths
  to server/agent_init.py / server/chat.py / server/a2a.py); fixed the BROKEN
  'python server.py' launch command in TEMPLATE.md -> 'python -m server'.
- a2a_handler.py (deleted in the A2A 1.0 migration) re-pointed to its real homes:
  a2a_auth.py (bearer/token), a2a_stores.py (webhook SSRF), a2a_executor.py
  (worldstate-delta), server/chat.py (caller-trace); diagram boxes relabeled to
  the conceptual 'A2A handler'.

Docs build green; no code touched.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… CI guard) (#580)

ADR 0023 promoted server.py → the server/ package (launch: python -m server),
but the k8s manifest and the OpenShell sandbox-create script still ran
'python server.py' — which exits immediately ('can't open file server.py'),
i.e. CrashLoopBackOff on k8s. PYTHONPATH=/opt/protoagent is baked as an image
ENV so it survives the command override; switch both to 'python -m server'.

Add a CI guard (checks.yml) that fails on any 'python server.py' invocation in
scripts/manifests/Dockerfiles so this can't regress.

Prod-readiness audit P0 (batch A).
…y (#581)

Prod-readiness audit P0 (network-boundary half). The server hardcoded
host=0.0.0.0, so a local/desktop run exposed the operator/console + OpenAI-compat
API (/api/*, /api/chat, /v1/*) — which are NOT auth-gated — to anything that
could reach the port.

- New --host arg / PROTOAGENT_HOST env, defaulting to 127.0.0.1 (loopback). Local
  + desktop-sidecar runs are now loopback-only by default.
- Containers bind 0.0.0.0 explicitly: entrypoint.sh passes --host 0.0.0.0, and the
  k8s + OpenShell command-overrides (which bypass the entrypoint) pass it too.
  Their boundary is the published port + network policy, not the in-container bind.
- Loud startup WARNING when binding non-loopback with no A2A auth token configured.

Verified: boot smoke binds 127.0.0.1 by default (Uvicorn running on
http://127.0.0.1); 1047 tests green. The credential-boundary half (bearer-gate
/api/* + /v1/*, which needs console-token plumbing) follows in a separate PR.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…client (#582)

Prod-readiness audit P1 (resilience).

- The model client had NO timeout — a hung/slow LiteLLM gateway blocked the turn
  (and the A2A task / SSE stream) indefinitely; under load these pile up. Add
  model.request_timeout (default 120s) + model.max_retries (default 2), passed as
  timeout=/max_retries= to ChatOpenAI in graph/llm.py.
- _a2a_push_client (httpx.AsyncClient) was created but never closed — connection
  pool leak on shutdown/reload (bites the desktop-sidecar restart loop). Close it
  best-effort in the shutdown hook.

Verified: 1047+ tests green (new test_llm asserts the bounds); boot + clean
SIGTERM shutdown smoke shows no push-client errors.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… (#583)

Prod-readiness audit P1. egress.check_url was a host allowlist that was
permissive (off) by default — so with no allowlist set, the model could
fetch_url an internal service or http://169.254.169.254/ (cloud metadata). And
fetch_url used follow_redirects=True, so a public URL 30x-redirecting to an
internal host bypassed the check entirely.

- egress.check_url now applies a default-on private-IP denylist when no allowlist
  is set: resolve the host, block loopback/link-local/private/multicast/reserved/
  metadata (+ unresolvable). Public hosts still work with no config. An allowlist,
  when set, is the explicit-trust path and bypasses the denylist (mirrors
  a2a_stores.is_safe_webhook_url).
- fetch_url disables auto-redirects and follows manually, re-checking each hop's
  host against egress (max 5 hops; refuses non-http(s) redirects).

Verified: test_egress updated for the new default (public allowed, private/metadata
blocked, allowlist bypass); 1047+ tests green.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…584)

Prod-readiness audit P1. The app exposes a correct readiness endpoint (/healthz
returns 200 only when STATE.graph is compiled, 503 during cold start) but nothing
used it — compose restart:unless-stopped couldn't detect a hung-but-alive
process, and k8s wouldn't gate traffic on readiness (requests hit the pod during
the model-cold-start window).

- Dockerfile HEALTHCHECK (curl /healthz; start-period 60s for first-compile).
- docker-compose healthcheck block.
- k8s readinessProbe (gates traffic) + livenessProbe (generous: initialDelay 90s
  + failureThreshold 6 so a slow first compile can't trigger a restart loop).

curl is already in the image. Both YAML files parse.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… (#585)

The "Attest build provenance" step calls actions/attest-build-provenance,
which needs the GitHub attestations feature — unavailable on private repos
without a paid plan. A fork that cuts a release therefore gets a noisy red
error on a step it can't use (the step is already continue-on-error, so the
release still publishes, but the failed annotation is misleading).

Gate it behind an opt-in `ATTESTATIONS_ENABLED` repo variable, mirroring the
existing `RELEASE_ENABLED` guard: forks leave it unset and the step is skipped
cleanly; repos with the feature set the variable to `true` — no workflow edit,
so it doesn't conflict when forks re-sync from upstream. continue-on-error is
kept as a belt-and-suspenders.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…a turn (#586)

Prod-readiness audit P1 (the durable fix the memory note keeps flagging). Every
wire bug this repo shipped (CRLF SSE #563, A2A 404s / eval rot #524, lean-image
FastAPI gap #426) was unit-green but wire-broken because nothing booted the real
server and exercised the actual transport.

- scripts/fake_openai_server.py — a zero-dep OpenAI-compat endpoint that streams a
  canned completion, so a real A2A turn reaches a terminal state without a gateway.
- scripts/live_smoke.py — boots `python -m server --ui none` against the fake model,
  waits /healthz, fetches the agent card, POSTs a real SendStreamingMessage turn,
  and asserts the SSE frames decode + reach a terminal frame.
- checks.yml: new 'A2A live smoke (lean tier)' job that installs requirements-core.txt
  (the production image's deps — also guards the lean-import gap class) and runs it.

Verified locally: lean server boots, card serves, streaming turn completes
end-to-end against the fake model.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Prod-readiness audit P1 — no static analysis was enforced (no ruff/mypy in
pyproject.toml or CI), and the untyped wire boundaries are where the recent
CRLF/tool-call bugs lived.

- pyproject.toml [tool.ruff]: select E/F/W; ignore the stylistic rules the
  codebase intentionally uses (E402 lazy imports, E501 long comment lines,
  E702/E731 semicolon/lambda style); per-file-ignore F401 in **/__init__.py
  (the re-export surfaces, esp. server/__init__.py).
- Cleaned the real signals it surfaced: 45 unused imports auto-removed across the
  tree + 4 dead assignments (incl. a stale old_config the Discord-reconnect
  detection never used). One F841 false-positive (parenthesized multi-with)
  noqa'd with a note.
- checks.yml: new 'Lint (ruff)' job, ruff pinned so a release can't fail the gate.

ruff check . is clean; 1049 tests green.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ction (#588)

Prod-readiness audit P1/P2 (observability + data integrity).

audit.py:
- Instance-scope the path via paths.scope_leaf (ADR 0004) — two instances on a
  shared FS no longer interleave records into one file. Path resolves LAZILY so
  it picks up PROTOAGENT_INSTANCE (seeded after import).
- Rotate at a 50MB cap (keep one .1 backup) — the log can't fill the disk.
- get_recent reads only the last 512KB (tail), not the whole file → no OOM on a
  large log.
- Cap _session_stats (OrderedDict, evict oldest) — no slow memory leak.

redaction.py:
- Add provider token shapes that a tool error could echo into the audit log:
  Discord bot, Google OAuth (ya29.) + API key (AIza), GitHub (gh?_), Slack (xox?-),
  AWS (AKIA), client_secret; + DISCORD_BOT_TOKEN/GATEWAY_API_KEY/GH_PAT/refresh_token
  to the env-key + assignment lists. Wired into _redact_string_simple.

Verified: benign text untouched (no over-redaction); 1054 tests green; ruff clean.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… (#589)

Prod-readiness audit P2 (data integrity). These four sqlite stores opened
connections with no PRAGMAs, unlike the checkpointer + knowledge store — so under
concurrent access (scheduler tick + console read + agent write) they threw
'database is locked' instead of waiting, and got no WAL concurrency. Add
journal_mode=WAL + busy_timeout=5000 to each connect (matching graph/checkpointer).
busy_timeout especially helps beads' single shared connection.

Verified: PRAGMAs apply (journal_mode=wal, busy_timeout=5000); 65 store tests green; ruff clean.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…(#591)

Prod-readiness audit P0 (credential boundary — completes the console-auth fix
after the localhost-default bind). The A2A guard only covered /a2a, so /api/*
(run subagents, rewrite config/SOUL, schedule jobs), /api/chat, and /v1/* were
unauthenticated even when an A2A token was configured.

- a2a_auth: guard /a2a + /api/ + /v1/ (engages only when a token is set — the
  no-token default stays open so the local console works). Exempt the read-only
  /api/events SSE stream (EventSource can't send a bearer; it exposes no action).
  /healthz, /.well-known, /metrics, static /app are outside the prefixes → public.
- console (api.ts): send Authorization: Bearer from localStorage
  protoagent.authToken on every fetch-based API + A2A call (blank ⇒ none), so a
  token-protected deployment's console authenticates. Default (no token) unchanged.
- docs: env-vars auth section documents the scope + the console token.

Verified: 11 auth tests (incl. /api+/v1 guarded when token set, /api/events +
/healthz public, all-open when no token); web build + chat e2e green; LIVE-smoked
a booted server with A2A_AUTH_TOKEN — /api/runtime/status 401 without / 200 with,
/a2a 401 without, /healthz 200.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Prod-readiness audit P2. metrics.py covered LLM/tool/cost/sessions but had no
A2A-turn signal — so 'turns are failing' or 'turns backing up' couldn't be
alerted from /metrics (the richest signal lived only in the telemetry SQL store).

- metrics.py: a2a_turns_total{state} counter (completed/failed/canceled — low
  cardinality) + a2a_turn_seconds latency histogram + record_a2a_turn(state, dur).
- server/a2a.py: emit it from _record_a2a_telemetry, independent of the SQL store,
  best-effort so it never affects a turn.
- tests/test_metrics.py (the audit flagged none existed).

1059 tests green; ruff clean.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…urs (closes #590) (#593)

Two footguns bit every fork sync this cycle (gina/jon/protoTrader):
1. Squash-merging an upstream-sync PR breaks the fork's merge base — the behind
   count stays inflated and every later sync re-conflicts on integrated code.
2. The inherited CHANGELOG merge=union splices upstream's whole changelog into
   the fork on each sync.

Promote the roxy-sync dev note into a fork-agnostic guide
(docs/guides/upstream-sync.md): the procedure (fetch both, branch off origin/main,
real merge), the merge-NOT-squash rule (+ the protoTrader before/after evidence),
the CHANGELOG merge=ours fork switch (+ the merge.ours.driver caveat), and the
now-tiny conflict surface (the operator-fork contract means edits → file a seam).

Register it in the sidebar; add a FORKS note to the template's .gitattributes
pointing at it (template keeps merge=union for its internal flow); mark the old
roxy note superseded. Docs build green.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…4 (#596)

Bring ORBIS's ACP-client pattern into protoAgent so the lead agent can hand a
real, repo-scoped coding job to a purpose-built CLI coding agent (protoCLI
`proto`, Claude Code, Codex, Gemini CLI) and get the result back.

New opt-in `coding_agent` plugin adds one tool, `code_with(agent, task)`, backed
by an ACP client (port of ORBIS's acp/client.py): launch the agent as a
subprocess, drive one session over JSON-RPC 2.0 on its stdio (initialize →
session/new → session/prompt), accumulate agent_message_chunk as the answer,
auto-allow session/request_permission. One client (subprocess + session) cached
per agent so follow-up calls continue the thread; a per-agent lock serializes
turns.

Security posture (PR1, ADR 0024): ships DISABLED with an empty agent list. Each
agent's workdir is config-pinned (the tool takes only agent+task, never a path)
and auto-allowed — the coding agent self-governs within its sandbox dir. HITL
gating + live A2A narration land in later PRs.

- plugins/coding_agent/{__init__.py,acp_client.py,protoagent.plugin.yaml}
- tests/test_coding_agent_plugin.py — config normalization, tool wiring, and a
  real ACP wire exchange against a fake agent subprocess (9 cases)
- ADR 0024 + index; docs/guides/coding-agents.md + sidebar; plugins guide note;
  CHANGELOG [Unreleased]

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… — ADR 0024 (PR3) (#599)

Adds safety controls to the coding_agent plugin (#596 shipped the base):

- Per-agent by-kind permission policy applied to the coding agent's
  session/request_permission requests (keyed on toolCall.kind):
    * auto (allow all — default, unchanged from PR1)
    * allowlist (allow all but execute/delete)
    * readonly (allow only read-like kinds)
  Overridable with allow_kinds / deny_kinds. AcpClient gains a pluggable
  permission resolver (default stays auto-allow, so PR1 behaviour is unchanged).
- Per-call consent gate: `confirm: true` makes code_with ask the operator via
  ask_human before each call (runs before any side effect, so LangGraph resume
  re-execution is idempotent). code_with drops @with_fallback so the interrupt
  control-flow exception propagates; the I/O is guarded with a local net.
- Agent recipes for protoCLI, Claude Code, Codex, Gemini CLI (docs + manifest).

Per-action live HITL is documented as deferred: pausing a blocking subprocess
session mid-turn is incompatible with interrupt()'s checkpoint/re-run model
(same coupling that blocks live narration). readonly/allowlist give
deterministic per-action control; confirm gives a per-call human gate.

- 17 new/updated tests incl. a wire test that a readonly policy rejects the
  fake agent's `edit` permission request (36 pass total).
- ADR 0024 scope + posture updated; guide permission section rewritten;
  CHANGELOG.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…R 0024 (PR4) (#600)

Completes the ADR 0024 PR plan with the coding-agent eval, plus the runner
mechanism it needs:

- eval runner: a case may declare `requires_env: [VAR, …]`; when any is unset
  the case is SKIPPED (new CaseResult.skipped; shown SKIP, excluded from the
  pass/fail tally and from the non-zero exit) instead of run. Generally useful —
  any case needing an optional integration can gate on it without breaking the
  default board.
- tasks.json: `code_with_delegation` (kind=ask, gated on EVAL_CODING_AGENT) —
  drives a live A2A turn that asks the agent to use code_with and asserts (audit
  channel) the tool fired. Skips by default.
- tests: _requirements_unmet, board counts skipped separately, case present +
  gated (test_eval_coverage, 22 pass; full eval suite 36 pass).
- docs: evals guide (requires_env), coding-agents guide (Eval it), ADR 0024
  scope, CHANGELOG.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
… turn (#594) (#595)

MCP tools were appended in build_mcp_tools without handle_tool_error, so when
langchain-mcp-adapters raises ToolException on an isError result (e.g. a server
404 from a stale id), it propagates out of the ToolNode and fails the whole A2A
turn. A single recoverable tool error (stale arg, transient 4xx) shouldn't kill
an otherwise-fine turn.

Set handle_tool_error on each kept MCP tool so the exception is caught inside
BaseTool.arun and returned to the model as a tool result it can act on (retry,
skip, or adapt). Every fork that configures MCP servers inherits this.

Tests: ToolException degrades via handle_tool_error; build_mcp_tools wires the
handler onto kept tools; handler returns a non-fatal, actionable message.

Closes #594.

Co-authored-by: Josh (via Claude) <josh@protolabs.studio>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ard URL (#598)

* feat(a2a): opt-in guard — refuse to start on a loopback card URL (protoLabsAI/protoAgent#597)

A deployed agent that advertises http://127.0.0.1:.../a2a (e.g. A2A_PUBLIC_URL
unset after a redeploy) is silently unreachable to remote consumers — a config
regression no test catches, surfacing only at first cross-host dispatch.

Add a2a.require_routable_url (default false). When set, assert_routable_card_url()
runs at startup: if the resolved card URL host is loopback (127.0.0.1/localhost/
::1/0.0.0.0) it logs a clear error pointing at A2A_PUBLIC_URL and exits non-zero.
Off by default so local + desktop runs still advertise loopback correctly.

Tests cover each loopback form, off-by-default, and the routable-passes case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(lint): drop unused `import types` in test_routable_card_url

The ruff gate (ruff==0.15.10) flagged F401 — `types` is imported but never
used. Removes it so the Lint check passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Josh (via Claude) <josh@protolabs.studio>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@protoquinn
Copy link
Copy Markdown

protoquinn Bot commented Jun 6, 2026

👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow.

Copy link
Copy Markdown

@protoquinn protoquinn Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Audit — PR #59 | chore(sync): upstream protoAgent v0.16.0 — incl. roxy's own #595/#598 fixes

VERDICT: WARN (CI not yet terminal — formal PASS/FAIL follows on next pass)


CI Status

  • Web E2E smoke: queued
  • Verify workspace config: queued
  • A2A live smoke (lean tier): queued
  • Python tests: queued
  • Lint (ruff): queued

Diff Review
Sync to protoAgent v0.16.0 via merge commit (4a769c4 + 4afd97d). Notable changes:

  • .github/workflows/checks.yml: Two new jobs — ruff lint (pinned 0.15.10) and a lean-tier A2A live smoke using scripts/live_smoke.py + requirements-core.txt. Both are additive; no regressions to existing jobs.
  • .github/workflows/release.yml: Fork-friendly ATTESTATIONS_ENABLED gate on provenance attestations step — upstream made this conditional so forks don't break on the unavailable feature. Correct.
  • CHANGELOG.md: merge=union on both sides produces a merged changelog. Per PR description roxy's own entries are preserved; verifiable at merge time.
  • Dockerfile: Adds HEALTHCHECK --interval=30s --start-period=60s hitting /healthz. Standard practice; does not interfere with existing CMD.
  • .gitattributes: Adds fork guidance comment for CHANGELOG.md merge=ours switch. Documentation only.
  • README.md: Architecture table updated to reflect server/ package layout (ADR 0023). Clean.

Observations

  • [MEDIUM] scripts/live_smoke.py is referenced in the new A2A live smoke job but its existence was not verifiable by this reviewer — the diff_summary is truncated at ~4400 lines and the file may be beyond the preview window. Flagging as a Gap: scripts/live_smoke.py should be confirmed to exist in the repo before the job runs, or the CI job will fail at execution. If it exists in the merge commit, the job is correctly scoped and this Gap closes.
  • [INFO] No CodeRabbit threads. No unresolved review comments.
  • [INFO] pyproject.toml version conflict resolved: roxy keeps 0.15.0 (the release bump follows post-merge). No functional change to the lockfile in this PR.
  • [INFO] The new ruff job is pinned to a specific version (0.15.10) to prevent an uncontrolled rule addition from breaking the gate — this is a good practice worth noting.

Checks needed for formal PASS:

  1. All 5 CI jobs must reach terminal (pass or gracefully skipped).
  2. scripts/live_smoke.py confirmed present in the merge commit tree.
  3. Python test suite at 965 passing (per PR author's claim).

— Quinn, QA Engineer

@protoquinn
Copy link
Copy Markdown

protoquinn Bot commented Jun 6, 2026

Submitted COMMENT review on #59.

Status: CI is still queued (5 checks in flight). The PR is structurally clean — merge commit, no conflicts, fork-friendly attestations gate, new jobs properly scoped. One Gap flagged: scripts/live_smoke.py existence unverified in the truncated diff. Once all checks reach terminal state, a re-review will determine whether to APPROVE or REQUEST_CHANGES.

Copy link
Copy Markdown

@protoquinn protoquinn Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI terminal-green, no blockers on prior review — auto-approving on green (#748).

@mabry1985 mabry1985 merged commit 9456d6b into main Jun 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant