Skip to content

Define backend async operation contract for CQRS writes #592

@louis4li

Description

@louis4li

id: cluster-triage-backend-async-cqrs-write-contract
severity: high
requires_design: true

来源

本 issue 由 maintainer 手动开,triage codex 深度调研补 evidence。

核心问题

Backend write APIs 已局部遵守 202 Accepted,但 accepted / committed / observed / ready / failed 没有一套跨 Workflow、Scripting、GAgentService、Studio 的统一 contract。结果是 CQRS Core 的 accepted-only skeleton、capability-specific readiness query、Studio observation endpoint、service run status URL 各自存在,solver 很容易在新 write path 里重新把 readmodel 可见性、invoke readiness 或 post-acceptance failure 拼回同步 API。

该问题不是“缺少字段”本身,而是 backend operation 阶段语义没有统一 ownership: 同步 ACK 只能证明 dispatch admission;readmodel freshness / readiness / terminal failure 必须通过明确 observe/readiness/readmodel surface 暴露。

Evidence

1. IActorDispatchPort 只承诺 accepted,但上层 operation contract 还没统一

src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs:67

// Refactor (iter149/issue1132): Old pattern: handled-dispatch side contract implied actor-turn completion.  New principle: IActorDispatchPort exposes accepted-only runtime/inbox admission.
public interface IActorDispatchPort
{
    /// <summary>
    /// Admits an envelope to the specified actor runtime/inbox boundary.
    /// Completion only means accepted-for-dispatch with a stable command id; it does not mean handled,
    /// committed, or observed by a read model.
    /// </summary>
    Task<DispatchAdmission> DispatchAsync(string actorId, EventEnvelope envelope, CancellationToken ct = default);
}

违反点: 基础层已经把 ACK 语义收紧到 accepted-only,但 capability HTTP/API response shape 没有统一表达后续 observeUrl / readinessUrl / readModelUrl / terminal failure,导致每个能力入口继续自行解释 stronger stage。

2. CQRS Core 已删除 stream-RPC outcome contract,但缺少跨能力 operation response 标准

src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs:3

// Refactor (iter158/cluster-001-stream-actor-outcome-rpc):
// Old: outcome dispatch used stream subscribe + TCS to treat the first outcome as an RPC reply, violating stream request-reply boundaries and honest ACK semantics.
// New: Delete the outcome-dispatch contract with no replacement; callers use accepted receipts plus readmodel queries or typed continuation events.
public interface ICommandDispatchService<in TCommand, TReceipt, TError>
{
    Task<CommandDispatchResult<TReceipt, TError>> DispatchAsync(
        TCommand command,
        CancellationToken ct = default);
}

违反点: accepted receipts plus readmodel queries or typed continuation events 是方向,但各 backend write API 没有统一最小 response shape,post-acceptance failure 也没有统一 observation contract。

3. Binding command path 是 transition point: 旧 readmodel polling 已被删除,但 readiness 还没有进入共享 contract

src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:56

// Refactor (iter2/cluster-006):
//   Old pattern: Upsert dispatched lifecycle commands then polled service catalog and serving readmodels before ACK.
//   New principle: Upsert returns accepted lifecycle ids; readmodel freshness is observed through explicit read paths.
public async Task<ScopeBindingUpsertResult> UpsertAsync(
    ScopeBindingUpsertRequest request,
    CancellationToken ct = default)

src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs:126

var expectedDeploymentId = $"{ServiceActorIds.Deployment(identity)}:{revisionId}";
// TODO(iter2/cluster-006): If callers need "invoke safe now", add an explicit read/projection
// observation path in a separate PR rather than blocking this command path on readmodels.
return desiredBinding.BuildResult(normalizedScopeId, identity.ServiceId, revisionId, expectedDeploymentId);

违反点: 代码已经知道 invoke safe now 不能阻塞 command path,但当前 issue 要把这个 TODO 升级为统一 backend operation contract,否则下一条 write API 仍会重新发明 polling/readiness 形状。

4. Readiness 已是显式 query surface,但状态 vocabulary 太 capability-specific

src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:26

public async Task<ScopeBindingReadinessSnapshot> GetReadinessAsync(
    ScopeBindingReadinessRequest request,
    CancellationToken ct = default)

src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs:44

var service = await _serviceLifecycleQueryPort.GetServiceAsync(identity, ct).ConfigureAwait(false);
if (service == null)
{
    return new ScopeBindingReadinessSnapshot(
        normalizedScopeId,
        normalizedServiceId,
        ScopeBindingReadinessStatus.ServiceCatalogMissing,
        ServiceCatalogVisible: false,
        ServingSetVisible: false,
        EligibleServingTargetVisible: false,
        InvokeReady: false,
        ObservedAtUtc: observedAtUtc);
}

违反点: readiness query 是正确边界,但缺少 shared operation-level status vocabulary,例如 pending / ready / failed / timed_out 与 command/correlation id 关联,frontend 或 caller 只能理解 capability-specific missing states。

5. Workflow upsert dispatches lifecycle commands then reads readmodel/fallbacks inside write path

src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:108

await _serviceCommandPort.CreateRevisionAsync(new CreateServiceRevisionCommand { Spec = revisionSpec }, ct);
await _serviceCommandPort.PrepareRevisionAsync(new PrepareServiceRevisionCommand
{
    Identity = identity.Clone(),
    RevisionId = revisionId,
}, ct);
await _serviceCommandPort.PublishRevisionAsync(new PublishServiceRevisionCommand
{
    Identity = identity.Clone(),
    RevisionId = revisionId,
}, ct);
await _serviceCommandPort.SetDefaultServingRevisionAsync(new SetDefaultServingRevisionCommand
{
    Identity = identity.Clone(),
    RevisionId = revisionId,
}, ct);
await _serviceCommandPort.ActivateServiceRevisionAsync(new ActivateServiceRevisionCommand
{
    Identity = identity.Clone(),
    RevisionId = revisionId,
}, ct);

src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs:132

var workflowSummary =
    await _scopeWorkflowQueryPort.GetByWorkflowIdAsync(normalizedScopeId, normalizedWorkflowId, ct) ??
    new ScopeWorkflowSummary(
        normalizedScopeId,
        normalizedWorkflowId,
        desiredDisplayName,
        ServiceKeys.Build(identity),
        ScopeWorkflowCapabilityConventions.NormalizeOptional(request.WorkflowName),
        expectedActorId,
        revisionId,
        expectedDeploymentId,
        "active",
        DateTimeOffset.UtcNow);

违反点: write-side upsert 用 readmodel 做结果增强,并在缺失时现场拼 fallback summary。即使不是 polling,这仍让 caller 难以区分 command accepted、readmodel observed、invoke ready。

6. Existing endpoints show local patterns but not one contract

src/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cs:410

// Refactor (iter56/cluster-891-endpoint-ack-honesty): old=200-shaped accepted, new=202 + Location
//   Service invoke is accepted for dispatch; the run resource is the status surface for outcome.
//   Never point Location at the service definition root because that is not the command/run status.
receipt.StatusUrl = BuildServiceRunStatusUrl(identity, receipt);
return Results.Accepted(receipt.StatusUrl, receipt);

src/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.cs:760

var accepted = await service.SaveAsync(scopeContext.ScopeId, request, ct);
return Results.Accepted(
    uri: $"/api/app/scripts/{Uri.EscapeDataString(accepted.ScriptId)}/save-observation",
    value: accepted);

违反点: Service invoke 用 run status URL,Studio script save 用 save-observation URL。这些局部 pattern 都合理,但缺少统一 operation response vocabulary,后续 capability 无法知道应该暴露 statusUrlobserveUrlreadinessUrlreadModelUrl 中哪些字段。

违反条款

CLAUDE.md:

  • 读写分离:Command -> EventQuery -> ReadModel;异步完成通过事件通知,不在会话内拼装流程。
  • ACK 诚实:同步返回只承诺已达到阶段(默认 accepted + stable command id);committed/read-model observed 等强保证须通过独立契约或异步观察获取。
  • 命令骨架内聚:标准生命周期 Normalize -> Resolve Target -> Build Context -> Build Envelope -> Dispatch -> Receipt -> Observe;业务模块只负责目标解析与载荷/结果映射。
  • 业务一致性与查询一致性分层:actor 间链路对"消息已接收/事件已提交/协议已推进"负责;readmodel 对"某 StateVersion 已物化可见"负责;禁止混用。

AGENTS.md:

  • ACK 语义必须诚实:同步返回只能承诺已经真实达到的阶段,默认应是 accepted for dispatch + stable command idcommittedread-model observed 等更强保证必须通过独立契约或异步观察获取,禁止在弱语义 ACK 中暗示强保证。
  • 命令骨架必须内聚:标准命令生命周期应收敛为 Normalize -> Resolve Target -> Build Context -> Build Envelope -> Dispatch -> Receipt -> Observe;业务模块只负责目标解析、载荷映射和结果映射,禁止各能力入口各自拼装一套流程。

docs/canon/cqrs-projection.md:

命令执行必须走 CQRS Core 标准命令骨架,不允许每个 capability 私自拼一套 resolve/ack/observe/finalize 生命周期;同时不引入与 runtime 平行的命令总线壳层。

统一返回 Accepted + commandId (+ actorId/correlationId),只承诺可追踪,不承诺 committed / observed。

新原则

Backend write operation response 只能同步表达 accepted 阶段和可追踪身份;更强语义必须用显式 observe/readiness/readmodel URL 或 typed stream frame 暴露。

所有 capability write APIs 复用同一最小 operation receipt vocabulary;业务模块只补 target identity、payload mapping、domain-specific readiness requirements,不再各自拼 accepted/observe/ready/failure 生命周期。

Post-acceptance execution failure 是 operation contract 的一等状态,必须能通过 commandId/correlationId 或 run/readmodel 查询到,不能在 202 Accepted 后丢失。

Fix boundary

scope_paths:

  • src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs
  • src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs
  • src/Aevatar.CQRS.Core.Abstractions/Commands/CommandDispatchResult.cs
  • src/Aevatar.CQRS.Core/Commands/DefaultCommandDispatchService.cs
  • src/Aevatar.CQRS.Core/Interactions/DefaultCommandInteractionService.cs
  • src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs
  • src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs
  • src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs
  • src/platform/Aevatar.GAgentService.Application/Scripts/ScopeScriptCommandApplicationService.cs
  • src/platform/Aevatar.GAgentService.Hosting/Endpoints/ServiceEndpoints.cs
  • src/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeWorkflowEndpoints.cs
  • src/platform/Aevatar.GAgentService.Hosting/Endpoints/ScopeScriptEndpoints.cs
  • src/Aevatar.Studio.Hosting/Endpoints/StudioEndpoints.cs
  • docs/canon/cqrs-projection.md
  • docs/canon/chat-api.md

Decision questions

  1. Shared backend receipt 是否应新增独立 DTO/abstraction,还是只定义 field vocabulary 并让现有 receipt types 实现一致 shape?
  2. ready 是否必须统一为 operation status (pending/ready/failed/timed_out),还是只要求 capability-specific readiness endpoint 返回这些顶层状态并保留细粒度 reason?
  3. Post-acceptance failure 的 canonical query key 是 commandIdcorrelationId、run/resource id,还是三者都必须在 receipt 中暴露?
  4. ScopeWorkflow upsert 的 readmodel summary fallback 应删除为 accepted-only handle,还是移动到 explicit observe/readiness endpoint?
  5. 哪些 endpoints 必须首批纳入: Service lifecycle writes、Service invoke、Scoped script save、Scoped workflow upsert、User config LLM save?

original_authors

  • src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingCommandApplicationService.cs: eanzhao, louis.li, loning
  • src/platform/Aevatar.GAgentService.Application/Workflows/ScopeWorkflowCommandApplicationService.cs: eanzhao
  • src/platform/Aevatar.GAgentService.Application/Bindings/ScopeBindingReadinessQueryService.cs: louis.li
  • src/Aevatar.Foundation.Abstractions/IActorDispatchPort.cs: loning
  • src/Aevatar.CQRS.Core.Abstractions/Commands/ICommandDispatchService.cs: loning

📢 cc @loning @eanzhao @louis4li

⟦AI:AUTO-LOOP⟧

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions