Scaffold Responses API v1 prototype#625
Conversation
Add /v1/responses ingress, ResponseSessionGAgent (opaque response.id ownership + lifecycle), ResponsesAgentToolStateGAgent (forwarded tool call state), and current-state projection plumbing for both. Wires tool classifier (forward/substitute/additive) and caller scope resolution at the protocol boundary. Touches milestone 14 issues #611, #612, #614, #617, #618, #621, #622. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 695bdc22a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (existing.Status == command.Status) | ||
| return; | ||
|
|
||
| await PersistDomainEventAsync(new ResponseSessionStatusUpdatedEvent |
There was a problem hiding this comment.
Preserve terminal cancellation status
When a streaming response is cancelled after response.created but before the LLM call finishes, this handler still accepts the later Completed update from WriteStreamResponseAsync because it persists every status change except exact duplicates. That lets the original request overwrite Cancelled with Completed, so /responses/{id}/cancel can report success while the session ultimately appears completed and any open tool calls are not left cancelled; reject transitions out of terminal states such as Cancelled, Failed, and Expired before persisting the new status.
Useful? React with 👍 / 👎.
| } | ||
|
|
||
| var token = AgentToolRequestContext.TryGet(LLMRequestMetadataKeys.NyxIdAccessToken) ?? string.Empty; | ||
| var result = await _webClient.FetchUrlAsync(token, url, ct); |
There was a problem hiding this comment.
[codex] severity=blocker, category=security
WebFetch passes the caller's NyxID bearer token to _webClient.FetchUrlAsync(token, url, ct) for an LLM/user-supplied URL. WebApiClient.FetchUrlAsync attaches that token as an Authorization header to the target URL, so a prompt can exfiltrate the user's token to an attacker-controlled host and also SSRF internal addresses. Do not send the bearer token to arbitrary fetch targets; route through a validated NyxID proxy or call fetch with no token, and add URL validation/private-network blocking before making the request.
| return createdAt.Add(ttl); | ||
| } | ||
|
|
||
| private static string BuildExpiredToolCallResult(string? callId) => |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=blocker, category=arch
System.Text.Json.JsonSerializer.Serialize used to produce result_json inside a domain actor. Violates CLAUDE.md serialization constraint: "统一 Protobuf … 禁止 JSON/XML/自定义字符串格式用于 Actor State … 事实存储". The BuildExpiredToolCallResult helper creates a JSON string that is stored in ResponseSessionForwardedToolCall.ResultJson and persisted via event sourcing. Move the JSON construction to the Infrastructure adapter or define a typed protobuf message for the expired-tool-call payload and serialize only at the host boundary.
| @@ -0,0 +1,423 @@ | |||
| using System.Security.Cryptography; | |||
| using System.Text; | |||
| using System.Text.Json; | |||
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=blocker, category=arch
System.Text.Json imported and used extensively in the domain actor for parsing argumentsJson (ParseTodoItems, ExtractQuery, etc.). This is the same serialization-rule violation as in ResponseSessionGAgent. The todo-item parse/normalize logic should either live in the Infrastructure adapter (ResponsesAgentToolStateCommandAdapter) or the arguments_json field should be typed protobuf rather than a raw JSON string that the domain actor deserializes. Same applies to ResponseSessionGAgent.cs:7.
| builder.Services.AddChannelIdentityProjectionStores(builder.Configuration); | ||
| builder.Services.AddDeviceRegistration(builder.Configuration); | ||
| builder.Services.AddScheduledAgents(builder.Configuration); | ||
| builder.Services.TryAddSingleton<IResponsesCallerScopeResolver, NyxIdResponsesCallerScopeResolver>(); |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=di
Both reviewers flag captive dependency: IResponsesCallerScopeResolver registered as TryAddSingleton here depends on INyxIdCurrentUserResolver, and the same Singleton-on-possibly-Scoped concern applies to ResponsesAevatarToolProvider -> WebApiClient/WebToolOptions. If any of those resolvers are Scoped (common for HTTP-context-bound services), the cached Singleton holds a stale reference per request. Verify lifetimes and downgrade to Scoped if any dependency is request-scoped.
Per-model verbatim
- v4-pro:
IResponsesCallerScopeResolveris registered asTryAddSingleton, but its only implementationNyxIdResponsesCallerScopeResolverdepends onINyxIdCurrentUserResolver. IfINyxIdCurrentUserResolveris registered as Scoped (common for HTTP-context-bound resolvers), this creates a captive dependency — the singleton holds a scoped reference that will be stale or disposed on subsequent requests. Verify the lifetime ofINyxIdCurrentUserResolverand either make this registration Scoped or confirm the dependency is also Singleton. - glm-5.1:
ResponsesAevatarToolProvideris registered asSingletonand depends onWebApiClientandWebToolOptions. TheAddWebToolscall registers these, but their lifetimes cannot be verified from the diff. IfWebApiClientorWebToolOptionsare registered asScoped/Transient, this is a captive dependency.
Similarly, NyxIdResponsesCallerScopeResolver (Singleton) depends on INyxIdCurrentUserResolver — if that resolver is scoped to the HTTP request, the cached instance will go stale or capture a disposed HttpClient.
Verify: Confirm all transitive dependencies of Singleton services are also Singleton.
| "CA1506:Avoid excessive class coupling", | ||
| Justification = "This Minimal API adapter coordinates one external Responses endpoint across HTTP, " + | ||
| "caller scope, durable session registration, and SSE shaping.")] | ||
| internal static async Task<IResult> HandleCreateResponseAsync( |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=arch
Both reviewers flag the same architectural rule violation: HandleCreateResponseAsync in this 1485-line file owns scope resolution, previous-response validation, session registration, multi-round LLM tool loop, forwarded-tool persistence, SSE framing, and status lifecycle. Per CLAUDE.md "API 仅做宿主与组合,不承载业务编排", extract this into an Application-layer port/service (e.g. IResponsesOrchestrationPort / ResponsesOrchestrationService) and leave the endpoint to handle only HTTP↔domain shaping (auth context extraction, SSE/JSON framing).
Per-model verbatim
- mimo-v2.5-pro:
HandleCreateResponseAsync(together with the private helpers in this file) orchestrates the full Responses business flow: scope resolution, session registration, tool classification, LLM message construction, tool-aware multi-round completion, tool-call persistence, SSE framing, and status lifecycle. Per CLAUDE.md "API 仅做宿主与组合,不承载业务编排", this orchestration logic belongs in an Application-layer service; the endpoint should only extract HTTP context and delegate. The 1485-line file is itself a signal. Consider extracting aResponsesServicein the Application layer for the orchestration. - codex:
HandleCreateResponseAsyncis a Host/API endpoint but owns request normalization, previous-response validation, session registration, LLM tool-loop execution, forwarded-tool persistence, and status transitions. That violates CLAUDE.md: API only does hosting/composition and should not carry core business orchestration. Move this flow behind an application-layer port/service (for example a Responses command service) and leave the endpoint responsible only for HTTP auth/binding and SSE/JSON transport shaping.
| { | ||
| public const string RequestId = "aevatar.request_id"; | ||
| public const string CallId = "aevatar.call_id"; | ||
| public const string ScopeId = "aevatar.scope_id"; |
There was a problem hiding this comment.
[kimi] severity=major, category=arch
The PR adds ScopeId, OwnerSubject, and ResponseId as string keys in the LLMRequest.Metadata bag. These are core semantic fields that drive business control flow (tool scoping, session linking, caller identity). CLAUDE.md mandates: "核心语义强类型:影响业务语义、控制流、稳定读取且仓库内可控的数据,必须建模为 proto field / typed option / typed sub-message,禁止塞入通用 bag。" Add typed fields to LLMRequest (or a typed sub-message) instead of stuffing them into the metadata dictionary.
| using Aevatar.AI.ToolProviders.Web; | ||
| using Aevatar.GAgentService.Abstractions; | ||
| using Aevatar.GAgentService.Abstractions.Ports; | ||
|
|
There was a problem hiding this comment.
[kimi] severity=major, category=arch
ResponsesAevatarToolProvider implements IAgentTool substitutes (TodoWriteTool, TaskTool, WebFetchTool, WebSearchTool) directly in the Host layer. These tools contain business logic (agent-scoped todo state, task topology, web caching). Per CLAUDE.md dependency-inversion and boundary rules, business capabilities belong in Application/Domain/AI layers, not the API host. Move the provider and its tools out of Host.
| { | ||
| private readonly IResponsesAgentToolStateCommandPort _commandPort; | ||
| private readonly IResponsesAgentToolStateQueryPort _queryPort; | ||
| private readonly WebApiClient _webClient; |
There was a problem hiding this comment.
[kimi] severity=major, category=di
ResponsesAevatarToolProvider injects the concrete class WebApiClient instead of an abstraction. This violates CLAUDE.md dependency inversion: "上层依赖抽象,禁止跨层反向依赖和对具体实现的直接耦合" (upper layers depend on abstractions, no direct coupling to concrete implementations). Introduce and inject an interface such as IWebApiClient.
| ArgumentNullException.ThrowIfNull(responseSessionRegistrationPort); | ||
| ArgumentNullException.ThrowIfNull(responseSessionQueryPort); | ||
| ArgumentNullException.ThrowIfNull(toolProviders); | ||
| ArgumentNullException.ThrowIfNull(loggerFactory); |
There was a problem hiding this comment.
[kimi] severity=major, category=arch
This Minimal API handler (HandleCreateResponseAsync) and its ~1,400-line file contain the full Responses protocol orchestration: request normalization, tool classification, LLM message assembly, SSE streaming, session registration, forwarded-tool-call reconciliation, and response building. This violates CLAUDE.md strict layering: "API 仅做宿主与组合,不承载核心业务编排" (API is only for hosting and composition, no business orchestration). Extract an Application-layer port/service (e.g., IResponsesOrchestrationPort) so the endpoint only deserializes input, invokes the port, and shapes the HTTP response.
| catch (OperationCanceledException) | ||
| { | ||
| return Results.StatusCode(499); | ||
| } |
There was a problem hiding this comment.
[v4-pro] severity=major, category=security
The raw NyxID bearer access token (from ExtractBearerToken) is placed into the LLMRequest.Metadata dictionary under key LLMRequestMetadataKeys.NyxIdAccessToken. This dictionary is then set on AgentToolRequestContext.CurrentMetadata (a global ambient mutable slot) and passed into the LLM provider via request.Metadata. Any middleware, logging sink, or tool provider that serializes or dumps AgentToolRequestContext.CurrentMetadata will leak the token to logs, telemetry, or tool output. Instead, pass the token through a narrowly-scoped mechanism (e.g., only set it on AgentToolRequestContext inside the tight try/finally blocks and strip it from LLMRequest.Metadata before the dictionary leaves the endpoint handler).
| ResponseSessionStatus.Failed, | ||
| CancellationToken.None); | ||
| return ToErrorResult( | ||
| StatusCodes.Status500InternalServerError, |
There was a problem hiding this comment.
[kimi] severity=major, category=security
The NyxID bearer token is placed into LLMRequest.Metadata with key LLMRequestMetadataKeys.NyxIdAccessToken and passed to the LLM provider. If the provider or its underlying transport logs the request or metadata, the bearer token may be persisted in logs or telemetry. Pass the token through a dedicated secure mechanism (e.g., a typed authentication header field on the provider contract) rather than a general metadata bag.
| } | ||
|
|
||
| var cancelledAt = DateTimeOffset.UtcNow.ToUnixTimeSeconds(); | ||
| if (visibleSnapshot.Status != ResponseSessionStatus.Cancelled) |
There was a problem hiding this comment.
[kimi] severity=major, category=security
The generic exception handler returns ex.Message directly in the HTTP response (ToErrorResult(..., ex.Message)). This can leak internal implementation details (upstream error text, connection info, etc.) to untrusted clients. Return a generic error with a correlation ID, and log the full exception server-side.
|
|
||
| private static async Task PersistForwardedToolCallsAsync( | ||
| IResponseSessionRegistrationPort responseSessionRegistrationPort, | ||
| ILogger logger, |
There was a problem hiding this comment.
[glm-5.1] severity=major, category=arch
The 8-round tool-call loop (for (var round = 0; round < 8; round++)) with classification, execution, and message accumulation is significant business orchestration living in the Host layer. Per AGENTS.md: "API 仅做宿主与组合,不承载核心业务编排". The same loop exists in both CollectToolAwareCompletionAsync and StreamToolAwareCompletionAsync.
Fix: Extract this into an application-level service (e.g. IResponsesCompletionOrchestrator) in the Application layer. The endpoint should only handle HTTP↔domain mapping. The magic number 8 should also become a named constant or configuration value.
| } | ||
| } | ||
| finally | ||
| { |
There was a problem hiding this comment.
[v4-pro] severity=major, category=concurrency
AgentToolRequestContext.CurrentMetadata is a static/ambient mutable field. The code saves it to previousMetadata, overwrites it with per-request data, and restores in a finally block. This is safe only if all operations within the try block are synchronous or if AgentToolRequestContext uses AsyncLocal<T>. If it uses a plain static field (or ThreadStatic), two concurrent requests sharing a thread-pool thread could observe each other's tokens. Verify AgentToolRequestContext is AsyncLocal-backed; if not, this is a cross-request data leak.
| } | ||
|
|
||
| private static bool RequiresSubstitution(string toolName) | ||
| { |
There was a problem hiding this comment.
[glm-5.1] severity=major, category=bug
StartsWith("Todo") matches any tool starting with "Todo" (e.g. TodoRead, TodoArchive, TodoDelete), not just TodoWrite. Only TodoWrite is registered as a substitute in ResponsesAevatarToolProvider.GetSubstituteTools(), so other Todo* tools will silently get the ResponsesUnavailableSubstituteTool stub instead of being forwarded to the client.
Fix: Either add explicit names to SubstituteToolNames (e.g. "TodoWrite", "todo_write") or restrict the prefix match to known names.
| string call_id = 1; | ||
| string tool_name = 2; | ||
| string schema_hash = 3; | ||
| string arguments_json = 4; |
There was a problem hiding this comment.
[codex] severity=major, category=arch
Persisting arguments_json/result_json in ResponseSessionForwardedToolCall makes JSON strings part of actor state and events; the same pattern is repeated for task/web traces. This violates CLAUDE.md: Serialization, which requires Actor State/domain events/internal persistence to use Protobuf rather than JSON strings. Keep raw external JSON at the Host adapter boundary, or model known payloads as typed proto submessages/Any plus an explicit opaque boundary contract instead of storing *_json fields in state/readmodels.
| if (existing.Status == command.Status) | ||
| return; | ||
|
|
||
| await PersistDomainEventAsync(new ResponseSessionStatusUpdatedEvent |
There was a problem hiding this comment.
[codex] severity=major, category=bug
HandleUpdateStatusAsync accepts any status transition once the status differs, so a /cancel can be overwritten later by the still-running create path's Completed update. That makes cancellation non-authoritative and can also move Failed/Expired sessions back to Completed. Add an explicit state machine/CAS guard that treats Cancelled, Expired, Failed, and probably Completed as terminal except for idempotent repeats, and have callers handle rejected transitions.
| string? argumentsJson, | ||
| string sourceResponseId, | ||
| DateTimeOffset observedAt) | ||
| { |
There was a problem hiding this comment.
[glm-5.1] severity=major, category=design
PreviewTodoItems duplicates the todo-item parsing logic from ResponsesAgentToolStateGAgent.ParseTodoItems (lines ~196-260 of that file). The adapter preview and the actor's authoritative parse can diverge, causing the returned ResponsesTodoWriteResult to disagree with what the actor actually persisted.
Fix: Extract a shared ResponsesTodoParser utility in Abstractions, or have the command port return the actor's actual persisted state instead of previewing client-side.
| provider.LastRequest.Messages[0].Content.Should().Be("ping"); | ||
| provider.LastRequest.Metadata.Should().Contain(LLMRequestMetadataKeys.NyxIdAccessToken, "secret-token"); | ||
| provider.LastRequest.Metadata.Should().ContainKey(LLMRequestMetadataKeys.RequestId); | ||
| provider.LastRequest.Metadata.Should().Contain("scope_id", "user-1"); |
There was a problem hiding this comment.
[codex] severity=major, category=test
This assertion uses the literal key scope_id, but production writes the metadata under LLMRequestMetadataKeys.ScopeId (aevatar.scope_id). As written, the new test should fail even when the endpoint behaves as implemented. Change the assertion to Contain(LLMRequestMetadataKeys.ScopeId, "user-1").
| private readonly IResponsesAgentToolStateQueryPort _queryPort; | ||
| private readonly WebApiClient _webClient; | ||
|
|
||
| public WebFetchTool( |
There was a problem hiding this comment.
[glm-5.1] severity=minor, category=security
WebFetchTool.NormalizeUrl silently upgrades http:// to https:// and prepends https:// to schemeless URLs. There is no SSRF validation — internal/private IP ranges (127.0.0.1, 10.x.x.x, 169.254.x.x) are not rejected. In a production prototype, an LLM could be prompted to fetch internal metadata endpoints (e.g. cloud instance metadata at 169.254.169.254).
Fix: Add URL validation that rejects private/loopback/link-local ranges before fetching.
| forwardedToolCalls, | ||
| DateTimeOffset.UtcNow, | ||
| ct); | ||
| await TryResolveIncomingToolResultsAsync( |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=bug
TryResolveIncomingToolResultsAsync and TryUpdateSessionStatusAsync are called with CancellationToken.None instead of the request-scoped ct. If the client disconnects mid-request, these fire-and-forget operations will still run to completion — wasting resources on a response nobody will receive. Pass ct instead of CancellationToken.None (same at lines 206, 468, 522 in the streaming path).
| return ToErrorResult(StatusCodes.Status401Unauthorized, "authentication_required", ex.Message); | ||
| } | ||
|
|
||
| var snapshot = await responseSessionQueryPort.GetByResponseIdAsync(responseId, ct); |
There was a problem hiding this comment.
[kimi] severity=minor, category=bug
OperationCanceledException is mapped to Results.StatusCode(499). 499 is a non-standard nginx-specific code, not defined in HTTP/1.1 or standard ASP.NET Core conventions. Let the framework handle cancellation (it will typically abort the response stream), or return StatusCodes.Status408RequestTimeout if a status code is explicitly required.
| { | ||
| var roundRequest = CloneRequestWithMessages(request, messages); | ||
| var toolCalls = new ResponsesToolCallAccumulator(); | ||
| var previousMetadata = AgentToolRequestContext.CurrentMetadata; |
There was a problem hiding this comment.
[kimi] severity=minor, category=design
The endpoint sets AgentToolRequestContext.CurrentMetadata (an AsyncLocal ambient context) before calling the LLM provider and restores it in finally. While AsyncLocal is thread-safe, relying on ambient state in the Host layer is a service-locator anti-pattern. Prefer passing metadata explicitly through the call chain (e.g., as part of LLMRequest or tool call arguments).
| "task", | ||
| ]; | ||
|
|
||
| public static ResponsesToolClassification Classify( |
There was a problem hiding this comment.
[v4-pro] severity=minor, category=design
SubstituteToolNames is a hardcoded string array that must be kept in sync with the tools returned by ResponsesAevatarToolProvider.GetSubstituteTools(). If a new substitute tool is added to the provider but forgotten here, the tool name won't be recognized as needing substitution. Consider deriving this list dynamically from providers.SelectMany(p => p.GetSubstituteTools()).Select(t => t.Name).ToHashSet() instead of maintaining two parallel registries.
| @@ -0,0 +1,55 @@ | |||
| using Aevatar.GAgentService.Abstractions.Queries; | |||
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=arch
This interface is named a command port but is actually mixed: ApplyTodoWriteAsync, RecordTaskAsync, RecordWebTraceAsync are commands, but the query methods (GetAsync, GetWebCacheEntryAsync) belong on the query port (IResponsesAgentToolStateQueryPort). The consumer (ResponsesAevatarToolProvider) correctly injects both ports separately; the recording test double conflates them. Clean the interface to contain only commands.
| string scope_id = 1; | ||
| string owner_subject = 2; | ||
| google.protobuf.Timestamp created_at = 3; | ||
| google.protobuf.Timestamp updated_at = 4; |
There was a problem hiding this comment.
[v4-pro] severity=minor, category=design
ResponsesAgentToolStateRecord holds scope_id and owner_subject as plain strings, but the actorId is deterministically derived from these via SHA256 in ResponseAgentToolStateIds.BuildActorId. This means two different scope_id/owner_subject combinations that hash-collide would map to the same actor. While SHA256 collision is practically impossible, the architecture rule says actor identity should not be cryptographically opaque — consider using a deterministic, non-hashed key format (e.g., responses-agent-tools-{scopeId}/{ownerSubject}) or at least document the hash truncation trade-off.
| ct); | ||
|
|
||
| return new ResponsesTaskDispatchResult(actor.Id, taskId, childActorId, "accepted", resultJson); | ||
| } |
There was a problem hiding this comment.
[v4-pro] severity=minor, category=design
EnsureActorAsync dispatches a RegisterResponsesAgentToolStateRequested command on every call to ensure the actor exists. The GAgent's HandleRegisterAsync is idempotent (returns early if already registered), but this means every TodoWrite/Task/WebTrace operation performs an extra redundant dispatch. Consider separating actor creation from command dispatch: only register on first access, and use an EnsureCreated pattern that skips the dispatch when the actor state already indicates registration.
| public static ResponsesToolClassification Classify( | ||
| IReadOnlyList<ResponsesToolDeclaration> declaredTools, | ||
| IEnumerable<IResponsesToolProvider> providers, | ||
| ILogger logger) |
There was a problem hiding this comment.
[v4-pro] severity=nit, category=style
Missing TodoWrite and Todo variants in SubstituteToolNames — RequiresSubstitution matches toolName.StartsWith("Todo", ...) || toolName.StartsWith("todo", ...) but the static list only has Task/WebFetch/WebSearch variants. This is inconsistent: the list doesn't include all de-facto substitute tool name prefixes.
Add test coverage for ResponseSessionRegistrationAdapter (5 dispatch methods + guard clauses + default field population) and ResponsesAgentToolStateCommandAdapter (TodoWrite parsing variants, task description fallback chain, web trace TraceId generation). Brings branch coverage on the new platform code back above the 72% guard threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add tests for the resolver's null-token / empty-userId / cancellation paths and the trimmed-scope happy path. Boosts patch coverage on src/Aevatar.Mainnet.Host.Api/Responses/ResponsesCallerScope.cs from 16% toward full coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tions Production changes: - Migrate response_sessions and tool-state protos from `string *_json` to `bytes *_payload`; actors stay Protobuf-only, JSON parsing moves to the host boundary (addresses mimo-v2.5-pro blocker findings #2/#3). - Extract `ResponsesTodoItemParser` in Abstractions so the host adapter preview and the actor's persisted view share one parser (addresses glm-5.1 finding on duplicated todo parsing). - Introduce `IWebApiClient` abstraction so `ResponsesAevatarToolProvider` depends on an interface (addresses kimi finding on DI inversion). - Add `WebFetchUrlGuard` to block loopback, private (10/8, 172.16/12, 192.168/16, 169.254/16, 127/8, 0/8), and link-local addresses before WebFetch emits an HTTP request (addresses codex blocker #1 on SSRF / bearer token exfiltration via attacker-controlled URLs). Test changes: - Update existing Response* tests to track the new proto shape. - Add `WebFetchUrlGuardTests` covering scheme rejection, IPv4/IPv6 private ranges, IPv4-mapped IPv6, hostname loopback variants, and the public-host accept path. - Add `ResponsesTodoItemParserTests` covering todos-array, single string/object, content-key fallback chain, numeric id, malformed JSON, and whitespace normalization. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
本提交新增了边界响应JSON处理工具类、响应完成应用服务,涵盖工具调用分类、流式调用结果累积、本地工具执行与转发逻辑,并附带了全面的单元测试覆盖所有核心功能场景。
- ConversationGAgent now strips per-call NyxID credentials (nyxid.access_token / nyxid.org_token / nyxid.sender_access_token) from NeedsLlmReplyEvent.Metadata before PersistDomainEventAsync, so short-lived sender tokens never reach event store / projection / read model. Credential key list lives in the new LlmReplyCredentialMetadataKeys helper next to the existing strip-on- persist site for reply_token. - AgentRunGAgent persists the produced reply payload (reply_text + outbound + terminal_state) BEFORE dispatching LlmReplyReadyEvent, so output-dispatch retries re-deliver from state via the new ReDispatchProducedReplyAsync path instead of re-entering the LLM / tool chain. New AgentRunReplyDispatchedEvent marks final delivery and gates terminal cleanup. FailAfterUnexpectedException follows the same persist-before-dispatch ordering. - ConversationGAgent.LarkCardStreaming wraps RunCardCreate / RunCardStream / RunCardFinalize in CancellationTokenSource(StreamingFailureUpdateTimeout) so a stuck CardKit upstream can no longer pin the actor turn forever, matching the per-call cap already used by the legacy text-edit streaming path. - Mainnet appsettings.json bumps Aevatar.NyxId.Relay.ResponseTimeoutSeconds 120 → 300 so the recent default change actually takes effect after configuration binding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-prototype Conflict in agents/Aevatar.GAgents.NyxidChat/AgentRunGAgent.cs reconciled between feature/lark-bot's "Harden NyxId chat dispatch" hardening and this branch's persist-before-dispatch refactor (Address review findings on credential leakage and dispatch retry safety). Resolution: - ProcessAsync end + metadata-timeout fallback: keep persist-before- dispatch ordering via ProduceAndDispatchAsync, but invoke FinalizeFailureStreamingSinkAsync first when terminalState=Failed so the live streaming card / edit message surfaces the fallback text before the LlmReplyReadyEvent lands (defensive hardening carried over from feature/lark-bot). - Drop FailAndDispatchReadyAsync (superseded by ProduceAndDispatchAsync). - FailAfterUnexpectedExceptionAsync: keep my null/empty early Failed return, and wrap ProduceAndDispatchAsync in try/catch with feature/lark-bot's defensive handling — schedule a re-dispatch retry if possible, otherwise persist Failed with the dispatch error appended to errorSummary so the original exception context is not lost. Auto-merged paths kept both sides' intent: - ConversationGAgent.cs: credential metadata strip-on-persist preserved alongside feature/lark-bot's surrounding changes. - ConversationGAgent.LarkCardStreaming.cs: CardKit per-call timeouts (StreamingFailureUpdateTimeout) preserved. - AgentRunGAgentTests.cs: persist-before-dispatch retry assertions (replyGenerator.CallCount == 1, ReplyDispatched, ProducedReplyText) preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues surfaced when the persist-before-dispatch refactor meets live production state and partial-failure paths: 1. Backward-compat: pre-refactor AgentRunReplyProducedEvents have no reply_text / outbound / terminal_state fields (proto3 defaults on deserialize). The new state machine would interpret them as "produced but not yet dispatched" with an empty payload, causing ReDispatchProducedReplyAsync to deliver a blank reply and HandleCleanupAsync to refuse to destroy the actor. The old codepath only wrote AgentRunReplyProducedEvent AFTER a successful dispatch, so ApplyReplyProduced now treats events with an empty reply_text as ReplyDispatched=true. New events always populate reply_text (empty replies fall back to a non-empty user message before persisting), so this is a reliable discriminator. 2. Post-dispatch persistence isolation: once DispatchReadyEventAsync delivers the LlmReplyReadyEvent, the user has the reply. If PersistReplyDispatchedAsync then throws, the exception would bubble to HandleStartAsync's outer `catch (Exception)`, which would call FailAfterUnexpectedExceptionAsync and re-enter ProduceAndDispatchAsync with the "Sorry, I couldn't complete this reply" fallback — a second user-visible message on top of the real reply. Introduced TryFinalizeAfterDispatchAsync that wraps the post-dispatch persist and cleanup-scheduling steps in their own logging-only catch so the actor never propagates these errors. The trade-off: state may stay at ReplyProduced+!ReplyDispatched until the next reconciliation, accepting a lingering actor instead of a duplicate reply. Regression coverage: - ApplyReplyProduced_HistoricalEventWithoutReplyText_MarksAsAlreadyDispatched - ApplyReplyProduced_NewEventWithReplyText_LeavesReplyAsNotYetDispatched - ProduceAndDispatch_WhenPersistDispatchedFails_DoesNotDeliverDuplicateFallbackReply (uses a new FailOnEventTypeSourcing helper to inject a persistence failure on AgentRunReplyDispatchedEvent only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up 00f99ce which fixes two production-blocking regressions in feature/lark-bot itself: 1. Lark typing reaction emoji_type case: revert "TYPING" -> "Typing". Lark treats this status-name reaction as case-sensitive (proper noun) — unlike the generic uppercase emoji aliases (OK, THUMBSUP, DONE). The uppercase form returned Lark code 231001 "reaction type is invalid" in production. The constant in this branch is already "Typing" (from ba93274); this merge brings in the matching test fixture updates (5 assertions in ChannelConversationTurnRunnerTests now expect "Typing"). 2. AgentRunDispatcher reverted from IActorDispatchPort.DispatchAsync back to IStreamProvider.GetStream().ProduceAsync. The DispatchAsync swap made the conversation actor synchronously await the entire LLM round-trip inside HandleEnvelopeAsync, blowing Orleans' 30s request budget and cascading timeouts through the streaming sink and relay webhook. ProduceAsync hands off to the stream pipeline, restoring the asynchronous dispatch contract that AgentRunGAgent was designed against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…replies
The previous ApplyReplyProduced backward-compat check —
string.IsNullOrEmpty(evt.ReplyText) => ReplyDispatched = true
— misclassifies a legitimate NEW event shape: interactive-only turns
(reply_with_interaction, card / button intents) produce an empty
reply_text alongside a non-null outbound. Those events would be marked
as already-dispatched on replay, skipping the dispatch retry path and
silently dropping the user's interactive reply.
The discriminator now also requires evt.Outbound to be null. Legacy
pre-refactor events have BOTH fields default (proto3 zero-values on
deserialize); new events always have at least one of them populated:
- text-only replies: non-empty reply_text
- interactive-only replies: non-null outbound
- empty-reply fallback: replyText replaced with a non-empty user
message before persisting
Regression coverage:
- ApplyReplyProduced_NewInteractiveOnlyEvent_EmptyReplyText_ButNonNullOutbound_IsNotMisclassifiedAsHistorical
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /v1/responses and /v1/responses/{id}/cancel handlers extract the
inbound Authorization bearer themselves and resolve the caller via
NyxID /me. Without .AllowAnonymous(), the host's FallbackPolicy
(installed by AddAevatarAuthentication) rejected opaque NyxID API
keys with an empty 401 from JwtBearer's default challenge before the
handler ever ran, breaking the issue #629 user flow behind the NyxID
proxy. PR #625's integration tests missed it because CreateAppAsync
uses a bare WebApplication.CreateBuilder and never wires the host's
auth pipeline. Add a regression test that uses AddAevatarAuthentication
to prove the handler is reached (structured authentication_required
JSON body, not an empty 401).
PR #625's ResponsesEndpoints intentionally kept the inbound NyxID bearer out of LLMRequest.Metadata to avoid leaking it through downstream logging and telemetry sinks, but NyxIdLLMProvider's accessTokenAccessor is hardcoded to () => null in BuildNyxIdFactory with the comment "token comes exclusively from per-request metadata". The two design intents contradicted: prod /v1/responses calls reached the LLM step and then failed with NyxIdAuthenticationRequiredException because the bearer was nowhere the provider could read it. RecordingLLMProvider does not authenticate, so the integration tests passed while prod broke. Extend LLMRequestCallerContext with an optional LLMRequestCallerCredentials sub-record (currently NyxIdBearer only; designed so future delegation tokens / identity JWTs are field additions, not new Metadata keys), and let NyxIdLLMProvider.ResolveAccessToken read it first, fall back to the existing Metadata key (kept so workflow / studio / LLMCallModule callers keep working without simultaneous migration), then the host accessor. Honors the author's logging-safety stance — Metadata stays string-bag clean — while satisfying CLAUDE.md "核心语义强类型" for per-request auth. NormalizeRequest in NyxIdLLMProvider was dropping CallerContext when rebuilding the LLMRequest; the new typed-bearer test caught that during revert-and-rerun, fixed by carrying it through. Tests: - MainnetResponsesEndpointsTests asserts CallerContext.Credentials.NyxIdBearer carries the inbound bearer AND Metadata.NotContainKey(NyxIdAccessToken) is preserved (typed channel vindicates the original intent). - NyxIdLLMProviderRoutingTests gains two cases locking the resolution priority: typed wins, typed beats metadata.
Closes #629
Summary
POST /v1/responsesingress, opaqueresponse.idownership actor (ResponseSessionGAgent), forwarded tool call state actor (ResponsesAgentToolStateGAgent), and current-state projection plumbing for bothTest plan
dotnet build aevatar.slnx --nologodotnet test aevatar.slnx --nologobash tools/ci/architecture_guards.sh)POST /v1/responseswith craftedX-NyxID-Identity-Token, verify SSE returnsResponseSession+ResponsesAgentToolStatecurrent-state readmodels populate after a request🤖 Generated with Claude Code