id: cluster-triage-backend-async-cqrs-write-contract
severity: high
requires_design: true
来源
本 issue 由 maintainer 手动开,triage codex 深度调研补 evidence。
核心问题
Backend write APIs 已局部遵守 202 Accepted,但 accepted / committed / observed / ready / failed 没有一套跨 Workflow、Scripting、GAgentService、Studio 的统一 contract。结果是 CQRS Core 的 accepted-only skeleton、capability-specific readiness query、Studio observation endpoint、service run status URL 各自存在,solver 很容易在新 write path 里重新把 readmodel 可见性、invoke readiness 或 post-acceptance failure 拼回同步 API。
该问题不是“缺少字段”本身,而是 backend operation 阶段语义没有统一 ownership: 同步 ACK 只能证明 dispatch admission;readmodel freshness / readiness / terminal failure 必须通过明确 observe/readiness/readmodel surface 暴露。
Evidence
1. IActorDispatchPort 只承诺 accepted,但上层 operation contract 还没统一
src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs:67
// Refactor (iter149/issue1132): Old pattern: handled-dispatch side contract implied actor-turn completion. New principle: IActorDispatchPort exposes accepted-only runtime/inbox admission.
public interface IActorDispatchPort
{
/// <summary>
/// Admits an envelope to the specified actor runtime/inbox boundary.
/// Completion only means accepted-for-dispatch with a stable command id; it does not mean handled,
/// committed, or observed by a read model.
/// </summary>
Task<DispatchAdmission> DispatchAsync(string actorId, EventEnvelope envelope, CancellationToken ct = default);
}
违反点: 基础层已经把 ACK 语义收紧到 accepted-only,但 capability HTTP/API response shape 没有统一表达后续 observeUrl / readinessUrl / readModelUrl / terminal failure,导致每个能力入口继续自行解释 stronger stage。
2. CQRS Core 已删除 stream-RPC outcome contract,但缺少跨能力 operation response 标准
src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs:3
// Refactor (iter158/cluster-001-stream-actor-outcome-rpc):
// Old: outcome dispatch used stream subscribe + TCS to treat the first outcome as an RPC reply, violating stream request-reply boundaries and honest ACK semantics.
// New: Delete the outcome-dispatch contract with no replacement; callers use accepted receipts plus readmodel queries or typed continuation events.
public interface ICommandDispatchService<in TCommand, TReceipt, TError>
{
Task<CommandDispatchResult<TReceipt, TError>> DispatchAsync(
TCommand command,
CancellationToken ct = default);
}
违反点: accepted receipts plus readmodel queries or typed continuation events 是方向,但各 backend write API 没有统一最小 response shape,post-acceptance failure 也没有统一 observation contract。
3. Binding command path 是 transition point: 旧 readmodel polling 已被删除,但 readiness 还没有进入共享 contract
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:56
// Refactor (iter2/cluster-006):
// Old pattern: Upsert dispatched lifecycle commands then polled service catalog and serving readmodels before ACK.
// New principle: Upsert returns accepted lifecycle ids; readmodel freshness is observed through explicit read paths.
public async Task<ScopeBindingUpsertResult> UpsertAsync(
ScopeBindingUpsertRequest request,
CancellationToken ct = default)
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:126
var expectedDeploymentId = $"{ServiceActorIds.Deployment(identity)}:{revisionId}";
// TODO(iter2/cluster-006): If callers need "invoke safe now", add an explicit read/projection
// observation path in a separate PR rather than blocking this command path on readmodels.
return desiredBinding.BuildResult(normalizedScopeId, identity.ServiceId, revisionId, expectedDeploymentId);
违反点: 代码已经知道 invoke safe now 不能阻塞 command path,但当前 issue 要把这个 TODO 升级为统一 backend operation contract,否则下一条 write API 仍会重新发明 polling/readiness 形状。
4. Readiness 已是显式 query surface,但状态 vocabulary 太 capability-specific
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:26
public async Task<ScopeBindingReadinessSnapshot> GetReadinessAsync(
ScopeBindingReadinessRequest request,
CancellationToken ct = default)
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:44
var service = await _serviceLifecycleQueryPort.GetServiceAsync(identity, ct).ConfigureAwait(false);
if (service == null)
{
return new ScopeBindingReadinessSnapshot(
normalizedScopeId,
normalizedServiceId,
ScopeBindingReadinessStatus.ServiceCatalogMissing,
ServiceCatalogVisible: false,
ServingSetVisible: false,
EligibleServingTargetVisible: false,
InvokeReady: false,
ObservedAtUtc: observedAtUtc);
}
违反点: readiness query 是正确边界,但缺少 shared operation-level status vocabulary,例如 pending / ready / failed / timed_out 与 command/correlation id 关联,frontend 或 caller 只能理解 capability-specific missing states。
5. Workflow upsert dispatches lifecycle commands then reads readmodel/fallbacks inside write path
src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:108
await _serviceCommandPort.CreateRevisionAsync(new CreateServiceRevisionCommand { Spec = revisionSpec }, ct);
await _serviceCommandPort.PrepareRevisionAsync(new PrepareServiceRevisionCommand
{
Identity = identity.Clone(),
RevisionId = revisionId,
}, ct);
await _serviceCommandPort.PublishRevisionAsync(new PublishServiceRevisionCommand
{
Identity = identity.Clone(),
RevisionId = revisionId,
}, ct);
await _serviceCommandPort.SetDefaultServingRevisionAsync(new SetDefaultServingRevisionCommand
{
Identity = identity.Clone(),
RevisionId = revisionId,
}, ct);
await _serviceCommandPort.ActivateServiceRevisionAsync(new ActivateServiceRevisionCommand
{
Identity = identity.Clone(),
RevisionId = revisionId,
}, ct);
src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:132
var workflowSummary =
await _scopeWorkflowQueryPort.GetByWorkflowIdAsync(normalizedScopeId, normalizedWorkflowId, ct) ??
new ScopeWorkflowSummary(
normalizedScopeId,
normalizedWorkflowId,
desiredDisplayName,
ServiceKeys.Build(identity),
ScopeWorkflowCapabilityConventions.NormalizeOptional(request.WorkflowName),
expectedActorId,
revisionId,
expectedDeploymentId,
"active",
DateTimeOffset.UtcNow);
违反点: write-side upsert 用 readmodel 做结果增强,并在缺失时现场拼 fallback summary。即使不是 polling,这仍让 caller 难以区分 command accepted、readmodel observed、invoke ready。
6. Existing endpoints show local patterns but not one contract
src/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cs:410
// Refactor (iter56/cluster-891-endpoint-ack-honesty): old=200-shaped accepted, new=202 + Location
// Service invoke is accepted for dispatch; the run resource is the status surface for outcome.
// Never point Location at the service definition root because that is not the command/run status.
receipt.StatusUrl = BuildServiceRunStatusUrl(identity, receipt);
return Results.Accepted(receipt.StatusUrl, receipt);
src/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.cs:760
var accepted = await service.SaveAsync(scopeContext.ScopeId, request, ct);
return Results.Accepted(
uri: $"/api/app/scripts/{Uri.EscapeDataString(accepted.ScriptId)}/save-observation",
value: accepted);
违反点: Service invoke 用 run status URL,Studio script save 用 save-observation URL。这些局部 pattern 都合理,但缺少统一 operation response vocabulary,后续 capability 无法知道应该暴露 statusUrl、observeUrl、readinessUrl、readModelUrl 中哪些字段。
违反条款
CLAUDE.md:
- 读写分离:
Command -> Event,Query -> ReadModel;异步完成通过事件通知,不在会话内拼装流程。
- ACK 诚实:同步返回只承诺已达到阶段(默认
accepted + stable command id);committed/read-model observed 等强保证须通过独立契约或异步观察获取。
- 命令骨架内聚:标准生命周期
Normalize -> Resolve Target -> Build Context -> Build Envelope -> Dispatch -> Receipt -> Observe;业务模块只负责目标解析与载荷/结果映射。
- 业务一致性与查询一致性分层:actor 间链路对"消息已接收/事件已提交/协议已推进"负责;readmodel 对"某
StateVersion 已物化可见"负责;禁止混用。
AGENTS.md:
- ACK 语义必须诚实:同步返回只能承诺已经真实达到的阶段,默认应是
accepted for dispatch + stable command id;committed、read-model observed 等更强保证必须通过独立契约或异步观察获取,禁止在弱语义 ACK 中暗示强保证。
- 命令骨架必须内聚:标准命令生命周期应收敛为
Normalize -> Resolve Target -> Build Context -> Build Envelope -> Dispatch -> Receipt -> Observe;业务模块只负责目标解析、载荷映射和结果映射,禁止各能力入口各自拼装一套流程。
docs/canon/cqrs-projection.md:
命令执行必须走 CQRS Core 标准命令骨架,不允许每个 capability 私自拼一套 resolve/ack/observe/finalize 生命周期;同时不引入与 runtime 平行的命令总线壳层。
统一返回 Accepted + commandId (+ actorId/correlationId),只承诺可追踪,不承诺 committed / observed。
新原则
Backend write operation response 只能同步表达 accepted 阶段和可追踪身份;更强语义必须用显式 observe/readiness/readmodel URL 或 typed stream frame 暴露。
所有 capability write APIs 复用同一最小 operation receipt vocabulary;业务模块只补 target identity、payload mapping、domain-specific readiness requirements,不再各自拼 accepted/observe/ready/failure 生命周期。
Post-acceptance execution failure 是 operation contract 的一等状态,必须能通过 commandId/correlationId 或 run/readmodel 查询到,不能在 202 Accepted 后丢失。
Fix boundary
scope_paths:
src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs
src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs
src/Aevatar.CQRS.Core.Abstractions/Commands/CommandDispatchResult.cs
src/Aevatar.CQRS.Core/Commands/DefaultCommandDispatchService.cs
src/Aevatar.CQRS.Core/Interactions/DefaultCommandInteractionService.cs
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs
src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs
src/platform/Aevatar.GAgentService.Application/Scripts/ScopeScriptCommandApplicationService.cs
src/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cs
src/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeWorkflowEndpoints.cs
src/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeScriptEndpoints.cs
src/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.cs
docs/canon/cqrs-projection.md
docs/canon/chat-api.md
Decision questions
- Shared backend receipt 是否应新增独立 DTO/abstraction,还是只定义 field vocabulary 并让现有 receipt types 实现一致 shape?
ready 是否必须统一为 operation status (pending/ready/failed/timed_out),还是只要求 capability-specific readiness endpoint 返回这些顶层状态并保留细粒度 reason?
- Post-acceptance failure 的 canonical query key 是
commandId、correlationId、run/resource id,还是三者都必须在 receipt 中暴露?
- ScopeWorkflow upsert 的 readmodel summary fallback 应删除为 accepted-only handle,还是移动到 explicit observe/readiness endpoint?
- 哪些 endpoints 必须首批纳入: Service lifecycle writes、Service invoke、Scoped script save、Scoped workflow upsert、User config LLM save?
original_authors
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs: eanzhao, louis.li, loning
src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs: eanzhao
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs: louis.li
src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs: loning
src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs: loning
📢 cc @loning @eanzhao @louis4li
⟦AI:AUTO-LOOP⟧
id: cluster-triage-backend-async-cqrs-write-contract
severity: high
requires_design: true
来源
本 issue 由 maintainer 手动开,triage codex 深度调研补 evidence。
核心问题
Backend write APIs 已局部遵守
202 Accepted,但accepted / committed / observed / ready / failed没有一套跨 Workflow、Scripting、GAgentService、Studio 的统一 contract。结果是 CQRS Core 的 accepted-only skeleton、capability-specific readiness query、Studio observation endpoint、service run status URL 各自存在,solver 很容易在新 write path 里重新把 readmodel 可见性、invoke readiness 或 post-acceptance failure 拼回同步 API。该问题不是“缺少字段”本身,而是 backend operation 阶段语义没有统一 ownership: 同步 ACK 只能证明 dispatch admission;readmodel freshness / readiness / terminal failure 必须通过明确 observe/readiness/readmodel surface 暴露。
Evidence
1.
IActorDispatchPort只承诺 accepted,但上层 operation contract 还没统一src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs:67违反点: 基础层已经把 ACK 语义收紧到 accepted-only,但 capability HTTP/API response shape 没有统一表达后续
observeUrl / readinessUrl / readModelUrl / terminal failure,导致每个能力入口继续自行解释 stronger stage。2. CQRS Core 已删除 stream-RPC outcome contract,但缺少跨能力 operation response 标准
src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs:3违反点:
accepted receipts plus readmodel queries or typed continuation events是方向,但各 backend write API 没有统一最小 response shape,post-acceptance failure 也没有统一 observation contract。3. Binding command path 是 transition point: 旧 readmodel polling 已被删除,但 readiness 还没有进入共享 contract
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:56src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:126违反点: 代码已经知道
invoke safe now不能阻塞 command path,但当前 issue 要把这个 TODO 升级为统一 backend operation contract,否则下一条 write API 仍会重新发明 polling/readiness 形状。4. Readiness 已是显式 query surface,但状态 vocabulary 太 capability-specific
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:26src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:44违反点: readiness query 是正确边界,但缺少 shared operation-level status vocabulary,例如
pending / ready / failed / timed_out与 command/correlation id 关联,frontend 或 caller 只能理解 capability-specific missing states。5. Workflow upsert dispatches lifecycle commands then reads readmodel/fallbacks inside write path
src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:108src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:132违反点: write-side upsert 用 readmodel 做结果增强,并在缺失时现场拼 fallback summary。即使不是 polling,这仍让 caller 难以区分 command accepted、readmodel observed、invoke ready。
6. Existing endpoints show local patterns but not one contract
src/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cs:410src/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.cs:760违反点: Service invoke 用 run status URL,Studio script save 用 save-observation URL。这些局部 pattern 都合理,但缺少统一 operation response vocabulary,后续 capability 无法知道应该暴露
statusUrl、observeUrl、readinessUrl、readModelUrl中哪些字段。违反条款
CLAUDE.md:
AGENTS.md:
docs/canon/cqrs-projection.md:
新原则
Backend write operation response 只能同步表达
accepted阶段和可追踪身份;更强语义必须用显式 observe/readiness/readmodel URL 或 typed stream frame 暴露。所有 capability write APIs 复用同一最小 operation receipt vocabulary;业务模块只补 target identity、payload mapping、domain-specific readiness requirements,不再各自拼
accepted/observe/ready/failure生命周期。Post-acceptance execution failure 是 operation contract 的一等状态,必须能通过 commandId/correlationId 或 run/readmodel 查询到,不能在
202 Accepted后丢失。Fix boundary
scope_paths:
src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cssrc/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cssrc/Aevatar.CQRS.Core.Abstractions/Commands/CommandDispatchResult.cssrc/Aevatar.CQRS.Core/Commands/DefaultCommandDispatchService.cssrc/Aevatar.CQRS.Core/Interactions/DefaultCommandInteractionService.cssrc/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cssrc/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cssrc/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cssrc/platform/Aevatar.GAgentService.Application/Scripts/ScopeScriptCommandApplicationService.cssrc/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cssrc/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeWorkflowEndpoints.cssrc/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeScriptEndpoints.cssrc/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.csdocs/canon/cqrs-projection.mddocs/canon/chat-api.mdDecision questions
ready是否必须统一为 operation status (pending/ready/failed/timed_out),还是只要求 capability-specific readiness endpoint 返回这些顶层状态并保留细粒度 reason?commandId、correlationId、run/resource id,还是三者都必须在 receipt 中暴露?original_authors
src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs: eanzhao, louis.li, loningsrc/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs: eanzhaosrc/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs: louis.lisrc/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs: loningsrc/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs: loning📢 cc @loning @eanzhao @louis4li
⟦AI:AUTO-LOOP⟧