Scaffold Responses API v1 prototype by eanzhao · Pull Request #625 · aevatarAI/aevatar

eanzhao · 2026-05-12T04:04:41Z

Closes #629

Summary

Adds POST /v1/responses ingress, opaque response.id ownership actor (ResponseSessionGAgent), forwarded tool call state actor (ResponsesAgentToolStateGAgent), and current-state projection plumbing for both
Wires tool classifier (forward / substitute / additive) and caller scope resolution at the protocol boundary
First slice for milestone 14 (Responses API v1 Minimal Prototype); also satisfies the acceptance items of the fine-grained planning issues Add a minimal /v1/responses ingress to Aevatar Mainnet Host #611, Normalize Responses payloads into internal Aevatar chat requests #612, Serialize Responses-compatible SSE output from Aevatar chat results #614, Establish run/session ownership actor for opaque response.id #617, Persist forwarded tool calls in run/session actor #618, Add scope resolution port (NyxID identity to aevatar scope) #621, Add tool classification layer at /v1/responses boundary (forward/substitute/additive) #622

Test plan

dotnet build aevatar.slnx --nologo
dotnet test aevatar.slnx --nologo
CI architecture guards (bash tools/ci/architecture_guards.sh)
Smoke: curl POST /v1/responses with crafted X-NyxID-Identity-Token, verify SSE returns
Read-side: verify ResponseSession + ResponsesAgentToolState current-state readmodels populate after a request

🤖 Generated with Claude Code

Add /v1/responses ingress, ResponseSessionGAgent (opaque response.id ownership + lifecycle), ResponsesAgentToolStateGAgent (forwarded tool call state), and current-state projection plumbing for both. Wires tool classifier (forward/substitute/additive) and caller scope resolution at the protocol boundary. Touches milestone 14 issues #611, #612, #614, #617, #618, #621, #622. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 695bdc22a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T04:07:52Z

+        if (existing.Status == command.Status)
+            return;
+
+        await PersistDomainEventAsync(new ResponseSessionStatusUpdatedEvent


Preserve terminal cancellation status

When a streaming response is cancelled after response.created but before the LLM call finishes, this handler still accepts the later Completed update from WriteStreamResponseAsync because it persists every status change except exact duplicates. That lets the original request overwrite Cancelled with Completed, so /responses/{id}/cancel can report success while the session ultimately appears completed and any open tool calls are not left cancelled; reject transitions out of terminal states such as Cancelled, Failed, and Expired before persisting the new status.

Useful? React with 👍 / 👎.

eanzhao · 2026-05-12T05:00:35Z

+            }
+
+            var token = AgentToolRequestContext.TryGet(LLMRequestMetadataKeys.NyxIdAccessToken) ?? string.Empty;
+            var result = await _webClient.FetchUrlAsync(token, url, ct);


[codex] severity=blocker, category=security

WebFetch passes the caller's NyxID bearer token to _webClient.FetchUrlAsync(token, url, ct) for an LLM/user-supplied URL. WebApiClient.FetchUrlAsync attaches that token as an Authorization header to the target URL, so a prompt can exfiltrate the user's token to an attacker-controlled host and also SSRF internal addresses. Do not send the bearer token to arbitrary fetch targets; route through a validated NyxID proxy or call fetch with no token, and add URL validation/private-network blocking before making the request.

eanzhao · 2026-05-12T05:00:37Z

+        return createdAt.Add(ttl);
+    }
+
+    private static string BuildExpiredToolCallResult(string? callId) =>


[mimo-v2.5-pro] severity=blocker, category=arch

System.Text.Json.JsonSerializer.Serialize used to produce result_json inside a domain actor. Violates CLAUDE.md serialization constraint: "统一 Protobuf … 禁止 JSON/XML/自定义字符串格式用于 Actor State … 事实存储". The BuildExpiredToolCallResult helper creates a JSON string that is stored in ResponseSessionForwardedToolCall.ResultJson and persisted via event sourcing. Move the JSON construction to the Infrastructure adapter or define a typed protobuf message for the expired-tool-call payload and serialize only at the host boundary.

eanzhao · 2026-05-12T05:00:38Z

@@ -0,0 +1,423 @@
+using System.Security.Cryptography;
+using System.Text;
+using System.Text.Json;


[mimo-v2.5-pro] severity=blocker, category=arch

System.Text.Json imported and used extensively in the domain actor for parsing argumentsJson (ParseTodoItems, ExtractQuery, etc.). This is the same serialization-rule violation as in ResponseSessionGAgent. The todo-item parse/normalize logic should either live in the Infrastructure adapter (ResponsesAgentToolStateCommandAdapter) or the arguments_json field should be typed protobuf rather than a raw JSON string that the domain actor deserializes. Same applies to ResponseSessionGAgent.cs:7.

eanzhao · 2026-05-12T05:00:39Z

        builder.Services.AddChannelIdentityProjectionStores(builder.Configuration);
        builder.Services.AddDeviceRegistration(builder.Configuration);
        builder.Services.AddScheduledAgents(builder.Configuration);
+        builder.Services.TryAddSingleton<IResponsesCallerScopeResolver, NyxIdResponsesCallerScopeResolver>();


[Consensus: 2 models] severity=major, category=di

Both reviewers flag captive dependency: IResponsesCallerScopeResolver registered as TryAddSingleton here depends on INyxIdCurrentUserResolver, and the same Singleton-on-possibly-Scoped concern applies to ResponsesAevatarToolProvider -> WebApiClient/WebToolOptions. If any of those resolvers are Scoped (common for HTTP-context-bound services), the cached Singleton holds a stale reference per request. Verify lifetimes and downgrade to Scoped if any dependency is request-scoped.

Per-model verbatim

v4-pro: IResponsesCallerScopeResolver is registered as TryAddSingleton, but its only implementation NyxIdResponsesCallerScopeResolver depends on INyxIdCurrentUserResolver. If INyxIdCurrentUserResolver is registered as Scoped (common for HTTP-context-bound resolvers), this creates a captive dependency — the singleton holds a scoped reference that will be stale or disposed on subsequent requests. Verify the lifetime of INyxIdCurrentUserResolver and either make this registration Scoped or confirm the dependency is also Singleton.

glm-5.1: ResponsesAevatarToolProvider is registered as Singleton and depends on WebApiClient and WebToolOptions. The AddWebTools call registers these, but their lifetimes cannot be verified from the diff. If WebApiClient or WebToolOptions are registered as Scoped/Transient, this is a captive dependency.

Similarly, NyxIdResponsesCallerScopeResolver (Singleton) depends on INyxIdCurrentUserResolver — if that resolver is scoped to the HTTP request, the cached instance will go stale or capture a disposed HttpClient.

Verify: Confirm all transitive dependencies of Singleton services are also Singleton.

eanzhao · 2026-05-12T05:00:40Z

+        "CA1506:Avoid excessive class coupling",
+        Justification = "This Minimal API adapter coordinates one external Responses endpoint across HTTP, " +
+                        "caller scope, durable session registration, and SSE shaping.")]
+    internal static async Task<IResult> HandleCreateResponseAsync(


[Consensus: 2 models] severity=major, category=arch

Both reviewers flag the same architectural rule violation: HandleCreateResponseAsync in this 1485-line file owns scope resolution, previous-response validation, session registration, multi-round LLM tool loop, forwarded-tool persistence, SSE framing, and status lifecycle. Per CLAUDE.md "API 仅做宿主与组合，不承载业务编排", extract this into an Application-layer port/service (e.g. IResponsesOrchestrationPort / ResponsesOrchestrationService) and leave the endpoint to handle only HTTP↔domain shaping (auth context extraction, SSE/JSON framing).

Per-model verbatim

mimo-v2.5-pro: HandleCreateResponseAsync (together with the private helpers in this file) orchestrates the full Responses business flow: scope resolution, session registration, tool classification, LLM message construction, tool-aware multi-round completion, tool-call persistence, SSE framing, and status lifecycle. Per CLAUDE.md "API 仅做宿主与组合，不承载业务编排", this orchestration logic belongs in an Application-layer service; the endpoint should only extract HTTP context and delegate. The 1485-line file is itself a signal. Consider extracting a ResponsesService in the Application layer for the orchestration.

codex: HandleCreateResponseAsync is a Host/API endpoint but owns request normalization, previous-response validation, session registration, LLM tool-loop execution, forwarded-tool persistence, and status transitions. That violates CLAUDE.md: API only does hosting/composition and should not carry core business orchestration. Move this flow behind an application-layer port/service (for example a Responses command service) and leave the endpoint responsible only for HTTP auth/binding and SSE/JSON transport shaping.

eanzhao · 2026-05-12T05:00:42Z

 {
    public const string RequestId = "aevatar.request_id";
    public const string CallId = "aevatar.call_id";
+    public const string ScopeId = "aevatar.scope_id";


[kimi] severity=major, category=arch

The PR adds ScopeId, OwnerSubject, and ResponseId as string keys in the LLMRequest.Metadata bag. These are core semantic fields that drive business control flow (tool scoping, session linking, caller identity). CLAUDE.md mandates: "核心语义强类型：影响业务语义、控制流、稳定读取且仓库内可控的数据，必须建模为 proto field / typed option / typed sub-message，禁止塞入通用 bag。" Add typed fields to LLMRequest (or a typed sub-message) instead of stuffing them into the metadata dictionary.

eanzhao · 2026-05-12T05:00:43Z

+using Aevatar.AI.ToolProviders.Web;
+using Aevatar.GAgentService.Abstractions;
+using Aevatar.GAgentService.Abstractions.Ports;
+


[kimi] severity=major, category=arch

ResponsesAevatarToolProvider implements IAgentTool substitutes (TodoWriteTool, TaskTool, WebFetchTool, WebSearchTool) directly in the Host layer. These tools contain business logic (agent-scoped todo state, task topology, web caching). Per CLAUDE.md dependency-inversion and boundary rules, business capabilities belong in Application/Domain/AI layers, not the API host. Move the provider and its tools out of Host.

eanzhao · 2026-05-12T05:00:44Z

+{
+    private readonly IResponsesAgentToolStateCommandPort _commandPort;
+    private readonly IResponsesAgentToolStateQueryPort _queryPort;
+    private readonly WebApiClient _webClient;


[kimi] severity=major, category=di

ResponsesAevatarToolProvider injects the concrete class WebApiClient instead of an abstraction. This violates CLAUDE.md dependency inversion: "上层依赖抽象，禁止跨层反向依赖和对具体实现的直接耦合" (upper layers depend on abstractions, no direct coupling to concrete implementations). Introduce and inject an interface such as IWebApiClient.

eanzhao · 2026-05-12T05:00:46Z

+        ArgumentNullException.ThrowIfNull(responseSessionRegistrationPort);
+        ArgumentNullException.ThrowIfNull(responseSessionQueryPort);
+        ArgumentNullException.ThrowIfNull(toolProviders);
+        ArgumentNullException.ThrowIfNull(loggerFactory);


[kimi] severity=major, category=arch

This Minimal API handler (HandleCreateResponseAsync) and its ~1,400-line file contain the full Responses protocol orchestration: request normalization, tool classification, LLM message assembly, SSE streaming, session registration, forwarded-tool-call reconciliation, and response building. This violates CLAUDE.md strict layering: "API 仅做宿主与组合，不承载核心业务编排" (API is only for hosting and composition, no business orchestration). Extract an Application-layer port/service (e.g., IResponsesOrchestrationPort) so the endpoint only deserializes input, invokes the port, and shapes the HTTP response.

eanzhao · 2026-05-12T05:00:48Z

+        catch (OperationCanceledException)
+        {
+            return Results.StatusCode(499);
+        }


[v4-pro] severity=major, category=security

The raw NyxID bearer access token (from ExtractBearerToken) is placed into the LLMRequest.Metadata dictionary under key LLMRequestMetadataKeys.NyxIdAccessToken. This dictionary is then set on AgentToolRequestContext.CurrentMetadata (a global ambient mutable slot) and passed into the LLM provider via request.Metadata. Any middleware, logging sink, or tool provider that serializes or dumps AgentToolRequestContext.CurrentMetadata will leak the token to logs, telemetry, or tool output. Instead, pass the token through a narrowly-scoped mechanism (e.g., only set it on AgentToolRequestContext inside the tight try/finally blocks and strip it from LLMRequest.Metadata before the dictionary leaves the endpoint handler).

eanzhao · 2026-05-12T05:00:49Z

+                ResponseSessionStatus.Failed,
+                CancellationToken.None);
+            return ToErrorResult(
+                StatusCodes.Status500InternalServerError,


[kimi] severity=major, category=security

The NyxID bearer token is placed into LLMRequest.Metadata with key LLMRequestMetadataKeys.NyxIdAccessToken and passed to the LLM provider. If the provider or its underlying transport logs the request or metadata, the bearer token may be persisted in logs or telemetry. Pass the token through a dedicated secure mechanism (e.g., a typed authentication header field on the provider contract) rather than a general metadata bag.

eanzhao · 2026-05-12T05:00:51Z

+        }
+
+        var cancelledAt = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
+        if (visibleSnapshot.Status != ResponseSessionStatus.Cancelled)


[kimi] severity=major, category=security

The generic exception handler returns ex.Message directly in the HTTP response (ToErrorResult(..., ex.Message)). This can leak internal implementation details (upstream error text, connection info, etc.) to untrusted clients. Return a generic error with a correlation ID, and log the full exception server-side.

eanzhao · 2026-05-12T05:00:52Z

+
+    private static async Task PersistForwardedToolCallsAsync(
+        IResponseSessionRegistrationPort responseSessionRegistrationPort,
+        ILogger logger,


[glm-5.1] severity=major, category=arch

The 8-round tool-call loop (for (var round = 0; round < 8; round++)) with classification, execution, and message accumulation is significant business orchestration living in the Host layer. Per AGENTS.md: "API 仅做宿主与组合，不承载核心业务编排". The same loop exists in both CollectToolAwareCompletionAsync and StreamToolAwareCompletionAsync.

Fix: Extract this into an application-level service (e.g. IResponsesCompletionOrchestrator) in the Application layer. The endpoint should only handle HTTP↔domain mapping. The magic number 8 should also become a named constant or configuration value.

eanzhao · 2026-05-12T05:00:54Z

+                }
+            }
+            finally
+            {


[v4-pro] severity=major, category=concurrency

AgentToolRequestContext.CurrentMetadata is a static/ambient mutable field. The code saves it to previousMetadata, overwrites it with per-request data, and restores in a finally block. This is safe only if all operations within the try block are synchronous or if AgentToolRequestContext uses AsyncLocal<T>. If it uses a plain static field (or ThreadStatic), two concurrent requests sharing a thread-pool thread could observe each other's tokens. Verify AgentToolRequestContext is AsyncLocal-backed; if not, this is a cross-request data leak.

eanzhao · 2026-05-12T05:00:55Z

+    }
+
+    private static bool RequiresSubstitution(string toolName)
+    {


[glm-5.1] severity=major, category=bug

StartsWith("Todo") matches any tool starting with "Todo" (e.g. TodoRead, TodoArchive, TodoDelete), not just TodoWrite. Only TodoWrite is registered as a substitute in ResponsesAevatarToolProvider.GetSubstituteTools(), so other Todo* tools will silently get the ResponsesUnavailableSubstituteTool stub instead of being forwarded to the client.

Fix: Either add explicit names to SubstituteToolNames (e.g. "TodoWrite", "todo_write") or restrict the prefix match to known names.

eanzhao · 2026-05-12T05:00:57Z

+  string call_id = 1;
+  string tool_name = 2;
+  string schema_hash = 3;
+  string arguments_json = 4;


[codex] severity=major, category=arch

Persisting arguments_json/result_json in ResponseSessionForwardedToolCall makes JSON strings part of actor state and events; the same pattern is repeated for task/web traces. This violates CLAUDE.md: Serialization, which requires Actor State/domain events/internal persistence to use Protobuf rather than JSON strings. Keep raw external JSON at the Host adapter boundary, or model known payloads as typed proto submessages/Any plus an explicit opaque boundary contract instead of storing *_json fields in state/readmodels.

eanzhao · 2026-05-12T05:00:58Z

+        if (existing.Status == command.Status)
+            return;
+
+        await PersistDomainEventAsync(new ResponseSessionStatusUpdatedEvent


[codex] severity=major, category=bug

HandleUpdateStatusAsync accepts any status transition once the status differs, so a /cancel can be overwritten later by the still-running create path's Completed update. That makes cancellation non-authoritative and can also move Failed/Expired sessions back to Completed. Add an explicit state machine/CAS guard that treats Cancelled, Expired, Failed, and probably Completed as terminal except for idempotent repeats, and have callers handle rejected transitions.

eanzhao · 2026-05-12T05:00:59Z

+        string? argumentsJson,
+        string sourceResponseId,
+        DateTimeOffset observedAt)
+    {


[glm-5.1] severity=major, category=design

PreviewTodoItems duplicates the todo-item parsing logic from ResponsesAgentToolStateGAgent.ParseTodoItems (lines ~196-260 of that file). The adapter preview and the actor's authoritative parse can diverge, causing the returned ResponsesTodoWriteResult to disagree with what the actor actually persisted.

Fix: Extract a shared ResponsesTodoParser utility in Abstractions, or have the command port return the actor's actual persisted state instead of previewing client-side.

eanzhao · 2026-05-12T05:01:01Z

+        provider.LastRequest.Messages[0].Content.Should().Be("ping");
+        provider.LastRequest.Metadata.Should().Contain(LLMRequestMetadataKeys.NyxIdAccessToken, "secret-token");
+        provider.LastRequest.Metadata.Should().ContainKey(LLMRequestMetadataKeys.RequestId);
+        provider.LastRequest.Metadata.Should().Contain("scope_id", "user-1");


[codex] severity=major, category=test

This assertion uses the literal key scope_id, but production writes the metadata under LLMRequestMetadataKeys.ScopeId (aevatar.scope_id). As written, the new test should fail even when the endpoint behaves as implemented. Change the assertion to Contain(LLMRequestMetadataKeys.ScopeId, "user-1").

eanzhao · 2026-05-12T05:01:03Z

+        private readonly IResponsesAgentToolStateQueryPort _queryPort;
+        private readonly WebApiClient _webClient;
+
+        public WebFetchTool(


[glm-5.1] severity=minor, category=security

WebFetchTool.NormalizeUrl silently upgrades http:// to https:// and prepends https:// to schemeless URLs. There is no SSRF validation — internal/private IP ranges (127.0.0.1, 10.x.x.x, 169.254.x.x) are not rejected. In a production prototype, an LLM could be prompted to fetch internal metadata endpoints (e.g. cloud instance metadata at 169.254.169.254).

Fix: Add URL validation that rejects private/loopback/link-local ranges before fetching.

eanzhao · 2026-05-12T05:01:04Z

+                forwardedToolCalls,
+                DateTimeOffset.UtcNow,
+                ct);
+            await TryResolveIncomingToolResultsAsync(


[mimo-v2.5-pro] severity=minor, category=bug

TryResolveIncomingToolResultsAsync and TryUpdateSessionStatusAsync are called with CancellationToken.None instead of the request-scoped ct. If the client disconnects mid-request, these fire-and-forget operations will still run to completion — wasting resources on a response nobody will receive. Pass ct instead of CancellationToken.None (same at lines 206, 468, 522 in the streaming path).

eanzhao · 2026-05-12T05:01:05Z

+            return ToErrorResult(StatusCodes.Status401Unauthorized, "authentication_required", ex.Message);
+        }
+
+        var snapshot = await responseSessionQueryPort.GetByResponseIdAsync(responseId, ct);


[kimi] severity=minor, category=bug

OperationCanceledException is mapped to Results.StatusCode(499). 499 is a non-standard nginx-specific code, not defined in HTTP/1.1 or standard ASP.NET Core conventions. Let the framework handle cancellation (it will typically abort the response stream), or return StatusCodes.Status408RequestTimeout if a status code is explicitly required.

eanzhao · 2026-05-12T05:01:07Z

+        {
+            var roundRequest = CloneRequestWithMessages(request, messages);
+            var toolCalls = new ResponsesToolCallAccumulator();
+            var previousMetadata = AgentToolRequestContext.CurrentMetadata;


[kimi] severity=minor, category=design

The endpoint sets AgentToolRequestContext.CurrentMetadata (an AsyncLocal ambient context) before calling the LLM provider and restores it in finally. While AsyncLocal is thread-safe, relying on ambient state in the Host layer is a service-locator anti-pattern. Prefer passing metadata explicitly through the call chain (e.g., as part of LLMRequest or tool call arguments).

eanzhao · 2026-05-12T05:01:08Z

+        "task",
+    ];
+
+    public static ResponsesToolClassification Classify(


[v4-pro] severity=minor, category=design

SubstituteToolNames is a hardcoded string array that must be kept in sync with the tools returned by ResponsesAevatarToolProvider.GetSubstituteTools(). If a new substitute tool is added to the provider but forgotten here, the tool name won't be recognized as needing substitution. Consider deriving this list dynamically from providers.SelectMany(p => p.GetSubstituteTools()).Select(t => t.Name).ToHashSet() instead of maintaining two parallel registries.

eanzhao · 2026-05-12T05:01:09Z

@@ -0,0 +1,55 @@
+using Aevatar.GAgentService.Abstractions.Queries;


[mimo-v2.5-pro] severity=minor, category=arch

This interface is named a command port but is actually mixed: ApplyTodoWriteAsync, RecordTaskAsync, RecordWebTraceAsync are commands, but the query methods (GetAsync, GetWebCacheEntryAsync) belong on the query port (IResponsesAgentToolStateQueryPort). The consumer (ResponsesAevatarToolProvider) correctly injects both ports separately; the recording test double conflates them. Clean the interface to contain only commands.

eanzhao · 2026-05-12T05:01:11Z

+  string scope_id = 1;
+  string owner_subject = 2;
+  google.protobuf.Timestamp created_at = 3;
+  google.protobuf.Timestamp updated_at = 4;


[v4-pro] severity=minor, category=design

ResponsesAgentToolStateRecord holds scope_id and owner_subject as plain strings, but the actorId is deterministically derived from these via SHA256 in ResponseAgentToolStateIds.BuildActorId. This means two different scope_id/owner_subject combinations that hash-collide would map to the same actor. While SHA256 collision is practically impossible, the architecture rule says actor identity should not be cryptographically opaque — consider using a deterministic, non-hashed key format (e.g., responses-agent-tools-{scopeId}/{ownerSubject}) or at least document the hash truncation trade-off.

eanzhao · 2026-05-12T05:01:12Z

+            ct);
+
+        return new ResponsesTaskDispatchResult(actor.Id, taskId, childActorId, "accepted", resultJson);
+    }


[v4-pro] severity=minor, category=design

EnsureActorAsync dispatches a RegisterResponsesAgentToolStateRequested command on every call to ensure the actor exists. The GAgent's HandleRegisterAsync is idempotent (returns early if already registered), but this means every TodoWrite/Task/WebTrace operation performs an extra redundant dispatch. Consider separating actor creation from command dispatch: only register on first access, and use an EnsureCreated pattern that skips the dispatch when the actor state already indicates registration.

eanzhao · 2026-05-12T05:01:14Z

+    public static ResponsesToolClassification Classify(
+        IReadOnlyList<ResponsesToolDeclaration> declaredTools,
+        IEnumerable<IResponsesToolProvider> providers,
+        ILogger logger)


[v4-pro] severity=nit, category=style

Missing TodoWrite and Todo variants in SubstituteToolNames — RequiresSubstitution matches toolName.StartsWith("Todo", ...) || toolName.StartsWith("todo", ...) but the static list only has Task/WebFetch/WebSearch variants. This is inconsistent: the list doesn't include all de-facto substitute tool name prefixes.

Add test coverage for ResponseSessionRegistrationAdapter (5 dispatch methods + guard clauses + default field population) and ResponsesAgentToolStateCommandAdapter (TodoWrite parsing variants, task description fallback chain, web trace TraceId generation). Brings branch coverage on the new platform code back above the 72% guard threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-12T05:59:39Z

Codecov Report

❌ Patch coverage is 75.22573% with 878 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.50%. Comparing base (00f99ce) to head (806b181).

Files with missing lines	Patch %	Lines
...r.Mainnet.Host.Api/Responses/ResponsesEndpoints.cs	65.01%	313 Missing and 33 partials ⚠️
...Host.Api/Responses/ResponsesAevatarToolProvider.cs	58.33%	106 Missing and 24 partials ⚠️
...AgentService.Core/GAgents/ResponseSessionGAgent.cs	64.90%	75 Missing and 51 partials ⚠️
...vice.Core/GAgents/ResponsesAgentToolStateGAgent.cs	68.26%	41 Missing and 32 partials ⚠️
...r.Mainnet.Host.Api/Responses/ResponsesApiModels.cs	76.44%	38 Missing and 23 partials ⚠️
...c/Aevatar.AI.ToolProviders.Web/WebFetchUrlGuard.cs	71.26%	19 Missing and 6 partials ⚠️
...actions/Queries/ResponsesAgentToolStateSnapshot.cs	51.06%	23 Missing ⚠️
...DependencyInjection/ServiceCollectionExtensions.cs	61.53%	10 Missing ⚠️
...rs/ResponsesAgentToolStateCurrentStateProjector.cs	87.01%	2 Missing and 8 partials ⚠️
...Projectors/ResponseSessionCurrentStateProjector.cs	85.24%	2 Missing and 7 partials ⚠️
... and 19 more

@@                 Coverage Diff                  @@
##           feature/lark-bot     #625      +/-   ##
====================================================
+ Coverage             72.30%   72.50%   +0.19%     
====================================================
  Files                  1258     1286      +28     
  Lines                 91545    95060    +3515     
  Branches              12005    12428     +423     
====================================================
+ Hits                  66190    68920    +2730     
- Misses                20620    21179     +559     
- Partials               4735     4961     +226

Flag	Coverage Δ
ci	`72.50% <75.22%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...I.ToolProviders.Web/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
...Mainnet.Host.Api/Responses/ResponsesCallerScope.cs	`100.00% <100.00%> (ø)`
...ractions/Ports/IResponseSessionRegistrationPort.cs	`100.00% <100.00%> (ø)`
...vice.Abstractions/Responses/ResponsesJsonValues.cs	`100.00% <100.00%> (ø)`
...DependencyInjection/ServiceCollectionExtensions.cs	`97.05% <100.00%> (+0.16%)`	⬆️
...ts/ResponseSessionCurrentStateProjectionContext.cs	`100.00% <100.00%> (ø)`
...nsesAgentToolStateCurrentStateProjectionContext.cs	`100.00% <100.00%> (ø)`
...Aevatar.AI.ToolProviders.Web/Tools/WebFetchTool.cs	`30.76% <87.50%> (+30.76%)`	⬆️
...t.Host.Api/Hosting/MainnetHostBuilderExtensions.cs	`93.33% <91.66%> (-0.22%)`	⬇️
...ure/Adapters/ResponseSessionRegistrationAdapter.cs	`99.24% <99.24%> (ø)`
... and 26 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add tests for the resolver's null-token / empty-userId / cancellation paths and the trimmed-scope happy path. Boosts patch coverage on src/Aevatar.Mainnet.Host.Api/Responses/ResponsesCallerScope.cs from 16% toward full coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tions Production changes: - Migrate response_sessions and tool-state protos from `string *_json` to `bytes *_payload`; actors stay Protobuf-only, JSON parsing moves to the host boundary (addresses mimo-v2.5-pro blocker findings #2/#3). - Extract `ResponsesTodoItemParser` in Abstractions so the host adapter preview and the actor's persisted view share one parser (addresses glm-5.1 finding on duplicated todo parsing). - Introduce `IWebApiClient` abstraction so `ResponsesAevatarToolProvider` depends on an interface (addresses kimi finding on DI inversion). - Add `WebFetchUrlGuard` to block loopback, private (10/8, 172.16/12, 192.168/16, 169.254/16, 127/8, 0/8), and link-local addresses before WebFetch emits an HTTP request (addresses codex blocker #1 on SSRF / bearer token exfiltration via attacker-controlled URLs). Test changes: - Update existing Response* tests to track the new proto shape. - Add `WebFetchUrlGuardTests` covering scheme rejection, IPv4/IPv6 private ranges, IPv4-mapped IPv6, hostname loopback variants, and the public-host accept path. - Add `ResponsesTodoItemParserTests` covering todos-array, single string/object, content-key fallback chain, numeric id, malformed JSON, and whitespace normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

本提交新增了边界响应JSON处理工具类、响应完成应用服务，涵盖工具调用分类、流式调用结果累积、本地工具执行与转发逻辑，并附带了全面的单元测试覆盖所有核心功能场景。

- ConversationGAgent now strips per-call NyxID credentials (nyxid.access_token / nyxid.org_token / nyxid.sender_access_token) from NeedsLlmReplyEvent.Metadata before PersistDomainEventAsync, so short-lived sender tokens never reach event store / projection / read model. Credential key list lives in the new LlmReplyCredentialMetadataKeys helper next to the existing strip-on- persist site for reply_token. - AgentRunGAgent persists the produced reply payload (reply_text + outbound + terminal_state) BEFORE dispatching LlmReplyReadyEvent, so output-dispatch retries re-deliver from state via the new ReDispatchProducedReplyAsync path instead of re-entering the LLM / tool chain. New AgentRunReplyDispatchedEvent marks final delivery and gates terminal cleanup. FailAfterUnexpectedException follows the same persist-before-dispatch ordering. - ConversationGAgent.LarkCardStreaming wraps RunCardCreate / RunCardStream / RunCardFinalize in CancellationTokenSource(StreamingFailureUpdateTimeout) so a stuck CardKit upstream can no longer pin the actor turn forever, matching the per-call cap already used by the legacy text-edit streaming path. - Mainnet appsettings.json bumps Aevatar.NyxId.Relay.ResponseTimeoutSeconds 120 → 300 so the recent default change actually takes effect after configuration binding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-prototype Conflict in agents/Aevatar.GAgents.NyxidChat/AgentRunGAgent.cs reconciled between feature/lark-bot's "Harden NyxId chat dispatch" hardening and this branch's persist-before-dispatch refactor (Address review findings on credential leakage and dispatch retry safety). Resolution: - ProcessAsync end + metadata-timeout fallback: keep persist-before- dispatch ordering via ProduceAndDispatchAsync, but invoke FinalizeFailureStreamingSinkAsync first when terminalState=Failed so the live streaming card / edit message surfaces the fallback text before the LlmReplyReadyEvent lands (defensive hardening carried over from feature/lark-bot). - Drop FailAndDispatchReadyAsync (superseded by ProduceAndDispatchAsync). - FailAfterUnexpectedExceptionAsync: keep my null/empty early Failed return, and wrap ProduceAndDispatchAsync in try/catch with feature/lark-bot's defensive handling — schedule a re-dispatch retry if possible, otherwise persist Failed with the dispatch error appended to errorSummary so the original exception context is not lost. Auto-merged paths kept both sides' intent: - ConversationGAgent.cs: credential metadata strip-on-persist preserved alongside feature/lark-bot's surrounding changes. - ConversationGAgent.LarkCardStreaming.cs: CardKit per-call timeouts (StreamingFailureUpdateTimeout) preserved. - AgentRunGAgentTests.cs: persist-before-dispatch retry assertions (replyGenerator.CallCount == 1, ReplyDispatched, ProducedReplyText) preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two issues surfaced when the persist-before-dispatch refactor meets live production state and partial-failure paths: 1. Backward-compat: pre-refactor AgentRunReplyProducedEvents have no reply_text / outbound / terminal_state fields (proto3 defaults on deserialize). The new state machine would interpret them as "produced but not yet dispatched" with an empty payload, causing ReDispatchProducedReplyAsync to deliver a blank reply and HandleCleanupAsync to refuse to destroy the actor. The old codepath only wrote AgentRunReplyProducedEvent AFTER a successful dispatch, so ApplyReplyProduced now treats events with an empty reply_text as ReplyDispatched=true. New events always populate reply_text (empty replies fall back to a non-empty user message before persisting), so this is a reliable discriminator. 2. Post-dispatch persistence isolation: once DispatchReadyEventAsync delivers the LlmReplyReadyEvent, the user has the reply. If PersistReplyDispatchedAsync then throws, the exception would bubble to HandleStartAsync's outer `catch (Exception)`, which would call FailAfterUnexpectedExceptionAsync and re-enter ProduceAndDispatchAsync with the "Sorry, I couldn't complete this reply" fallback — a second user-visible message on top of the real reply. Introduced TryFinalizeAfterDispatchAsync that wraps the post-dispatch persist and cleanup-scheduling steps in their own logging-only catch so the actor never propagates these errors. The trade-off: state may stay at ReplyProduced+!ReplyDispatched until the next reconciliation, accepting a lingering actor instead of a duplicate reply. Regression coverage: - ApplyReplyProduced_HistoricalEventWithoutReplyText_MarksAsAlreadyDispatched - ApplyReplyProduced_NewEventWithReplyText_LeavesReplyAsNotYetDispatched - ProduceAndDispatch_WhenPersistDispatchedFails_DoesNotDeliverDuplicateFallbackReply (uses a new FailOnEventTypeSourcing helper to inject a persistence failure on AgentRunReplyDispatchedEvent only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Picks up 00f99ce which fixes two production-blocking regressions in feature/lark-bot itself: 1. Lark typing reaction emoji_type case: revert "TYPING" -> "Typing". Lark treats this status-name reaction as case-sensitive (proper noun) — unlike the generic uppercase emoji aliases (OK, THUMBSUP, DONE). The uppercase form returned Lark code 231001 "reaction type is invalid" in production. The constant in this branch is already "Typing" (from ba93274); this merge brings in the matching test fixture updates (5 assertions in ChannelConversationTurnRunnerTests now expect "Typing"). 2. AgentRunDispatcher reverted from IActorDispatchPort.DispatchAsync back to IStreamProvider.GetStream().ProduceAsync. The DispatchAsync swap made the conversation actor synchronously await the entire LLM round-trip inside HandleEnvelopeAsync, blowing Orleans' 30s request budget and cascading timeouts through the streaming sink and relay webhook. ProduceAsync hands off to the stream pipeline, restoring the asynchronous dispatch contract that AgentRunGAgent was designed against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…replies The previous ApplyReplyProduced backward-compat check — string.IsNullOrEmpty(evt.ReplyText) => ReplyDispatched = true — misclassifies a legitimate NEW event shape: interactive-only turns (reply_with_interaction, card / button intents) produce an empty reply_text alongside a non-null outbound. Those events would be marked as already-dispatched on replay, skipping the dispatch retry path and silently dropping the user's interactive reply. The discriminator now also requires evt.Outbound to be null. Legacy pre-refactor events have BOTH fields default (proto3 zero-values on deserialize); new events always have at least one of them populated: - text-only replies: non-empty reply_text - interactive-only replies: non-null outbound - empty-reply fallback: replyText replaced with a non-empty user message before persisting Regression coverage: - ApplyReplyProduced_NewInteractiveOnlyEvent_EmptyReplyText_ButNonNullOutbound_IsNotMisclassifiedAsHistorical Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /v1/responses and /v1/responses/{id}/cancel handlers extract the inbound Authorization bearer themselves and resolve the caller via NyxID /me. Without .AllowAnonymous(), the host's FallbackPolicy (installed by AddAevatarAuthentication) rejected opaque NyxID API keys with an empty 401 from JwtBearer's default challenge before the handler ever ran, breaking the issue #629 user flow behind the NyxID proxy. PR #625's integration tests missed it because CreateAppAsync uses a bare WebApplication.CreateBuilder and never wires the host's auth pipeline. Add a regression test that uses AddAevatarAuthentication to prove the handler is reached (structured authentication_required JSON body, not an empty 401).

PR #625's ResponsesEndpoints intentionally kept the inbound NyxID bearer out of LLMRequest.Metadata to avoid leaking it through downstream logging and telemetry sinks, but NyxIdLLMProvider's accessTokenAccessor is hardcoded to () => null in BuildNyxIdFactory with the comment "token comes exclusively from per-request metadata". The two design intents contradicted: prod /v1/responses calls reached the LLM step and then failed with NyxIdAuthenticationRequiredException because the bearer was nowhere the provider could read it. RecordingLLMProvider does not authenticate, so the integration tests passed while prod broke. Extend LLMRequestCallerContext with an optional LLMRequestCallerCredentials sub-record (currently NyxIdBearer only; designed so future delegation tokens / identity JWTs are field additions, not new Metadata keys), and let NyxIdLLMProvider.ResolveAccessToken read it first, fall back to the existing Metadata key (kept so workflow / studio / LLMCallModule callers keep working without simultaneous migration), then the host accessor. Honors the author's logging-safety stance — Metadata stays string-bag clean — while satisfying CLAUDE.md "核心语义强类型" for per-request auth. NormalizeRequest in NyxIdLLMProvider was dropping CallerContext when rebuilding the LLMRequest; the new typed-bearer test caught that during revert-and-rerun, fixed by carrying it through. Tests: - MainnetResponsesEndpointsTests asserts CallerContext.Credentials.NyxIdBearer carries the inbound bearer AND Metadata.NotContainKey(NyxIdAccessToken) is preserved (typed channel vindicates the original intent). - NyxIdLLMProviderRoutingTests gains two cases locking the resolution priority: typed wins, typed beats metadata.

eanzhao requested review from jason-aelf and louis4li as code owners May 12, 2026 04:04

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

eanzhao commented May 12, 2026

View reviewed changes

eanzhao and others added 3 commits May 12, 2026 14:22

eanzhao mentioned this pull request May 12, 2026

Responses API v1 最小端到端切片：ingress + session/tool actor + projection + 工具分类（#609 子集） #629

Open

eanzhao and others added 6 commits May 12, 2026 17:59

Route daily slash command to Ornn skill

ba93274

eanzhao merged commit 0225560 into feature/lark-bot May 13, 2026
13 checks passed

eanzhao mentioned this pull request May 15, 2026

NyxID Responses 直连无法访问用户 Ornn skills，需共享 skill 主干 #664

Open

		@@ -0,0 +1,55 @@
		using Aevatar.GAgentService.Abstractions.Queries;

Conversation

eanzhao commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

eanzhao commented May 12, 2026 •

edited

Loading

codecov Bot commented May 12, 2026 •

edited

Loading