[WIP] Lark bot enhancement by eanzhao · Pull Request #562 · aevatarAI/aevatar

eanzhao · 2026-05-04T07:03:50Z

No description provided.

Two production failure modes observed 2026-05-03 leave a Lark user visibly stuck even though the cluster is healthy. Both healed locally, no NyxID changes: (1) /llm /route /model says "binding 已失效, 请发送 /init" while /init says "已绑定, 请先 /unbind". After PR #558 re-DCR'd the cluster's OAuth client to add the `proxy` scope, the user's existing NyxID binding (issued for the previous client_id) is rejected at broker token-exchange. ModelChannelSlashCommandHandler caught the rejection and pointed the user at /init — which then refused because the local readmodel still holds the (now-dead) binding_id, looping the user forever. Self-heal: when the broker throws BindingRevokedException / BindingNotFoundException / BindingScopeMismatchException, dispatch RevokeBindingCommand to the local ExternalIdentityBindingGAgent (reason="auto_self_heal_*") so the readmodel flips to revoked, and tell the user the binding was cleared and /init will work now. Mirrors the dispatch shape UnbindChannelSlashCommandHandler uses for explicit /unbind, but skips the NyxID-side revoke (NyxID is the one that just told us the binding is gone). (2) Bot replies as "..." forever when the LLM call fails. The streaming sink fires the first chunk via channel-relay/reply, consuming the single-use reply token and creating a placeholder message. If the LLM then fails (e.g. upstream 429), pre-fix the runtime fell through to RunLlmReplyAsync which issued a fresh /reply against the dead token and got `401 Reply token already used` — leaving the user staring at "..." with no error explanation. Self-heal: ConversationGAgent.TryCompleteStreamedReplyAsync now takes the Failed branch when streaming has already committed the placeholder. Edits the existing message in place via RunStreamChunkAsync (channel-relay/reply/update — no reply token needed) with the classified failure text, then persists ConversationTurnCompletedEvent so the runtime envelope retry loop does not refire and consume the dead token again. If the edit also fails (rare: Lark may refuse stale-message edits), persist the last flushed partial as terminal — same defence-in-depth pattern the existing PR #374 fix uses for the Completed path. Tests: - 4 new ModelSlashCommandHandlerTests pinning the binding self-heal for each rejection shape + degraded-path when IActorRuntime is missing. - 2 new ConversationGAgentDedupTests pinning the streaming-Failed branch edits the placeholder + falls through to "persist last flushed partial" when the edit also fails. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (851/851) dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests --no-build (36/36 in dedup suite) dotnet test test/Aevatar.Foundation.Core.Tests (230/230) bash tools/ci/test_stability_guards.sh (passed) bash tools/ci/query_projection_priming_guard.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… path Both reviewers caught the same bug: pre-fix the self-heal returned the "本地已自动清理" message even when IActorRuntime was unregistered or when CreateAsync / HandleEventAsync threw — the local readmodel was NOT actually cleaned in those paths, so the user follows the message to /init, /init still sees the stale binding and refuses, recreating the exact loop this PR exists to break. Split the self-heal API into cleanedMessage + degradedMessage. The cleaned message is returned ONLY when the local revoke envelope was actually dispatched to the binding actor; otherwise the degraded message points the user at /unbind explicitly. Also add a single retry on the dispatch path (mirror UnbindChannelSlashCommandHandler's PR #521 v4-pro review fix) so a one-off Orleans hiccup doesn't downgrade an otherwise self-healable binding to manual /unbind guidance. Tests: - List_DegradesToUnbindGuidance_WhenSelfHealActorRuntimeMissing now asserts the reply contains "/unbind" and DOES NOT contain "已自动清理" — pinning the no-runtime degraded path. - List_DegradesToUnbindGuidance_WhenSelfHealDispatchKeepsThrowing is a new test exercising ThrowingActorRuntime (every CreateAsync throws) and asserts AttemptCount == 2 (retry-once contract) plus the degraded reply shape. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (852/852) bash tools/ci/test_stability_guards.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ly-self-heal Self-heal binding rejections + streaming-failed reply-token loop

codecov · 2026-05-04T07:23:02Z

Codecov Report

❌ Patch coverage is 80.84633% with 172 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.93%. Comparing base (65e4859) to head (c2f1969).

Files with missing lines	Patch %	Lines
...on/Studio/Services/NyxIdLlmServiceCatalogParser.cs	79.82%	23 Missing and 23 partials ⚠️
...rc/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs	67.07%	21 Missing and 6 partials ⚠️
....ToolProviders.NyxId/Tools/NyxIdCodeExecuteTool.cs	42.30%	10 Missing and 5 partials ⚠️
....Studio.Hosting/NyxId/NyxIdLlmCatalogHttpClient.cs	46.15%	13 Missing and 1 partial ⚠️
...Aevatar.AI.ToolProviders.Lark/LarkCardKitClient.cs	87.96%	9 Missing and 4 partials ⚠️
...c/Aevatar.AI.ToolProviders.Skills/SkillRegistry.cs	64.70%	10 Missing and 2 partials ⚠️
...tar.AI.ToolProviders.NyxId/NyxIdAgentToolSource.cs	0.00%	10 Missing ⚠️
...atar.AI.ToolProviders.Ornn/OrnnSearchSkillsTool.cs	0.00%	7 Missing and 2 partials ⚠️
...c/Aevatar.AI.ToolProviders.Ornn/OrnnSkillClient.cs	87.09%	4 Missing and 4 partials ⚠️
src/Aevatar.AI.Core/Chat/ChatRuntime.cs	84.21%	4 Missing and 2 partials ⚠️
... and 6 more

@@            Coverage Diff             @@
##              dev     #562      +/-   ##
==========================================
- Coverage   72.02%   71.93%   -0.10%     
==========================================
  Files        1255     1258       +3     
  Lines       90723    91512     +789     
  Branches    11877    11998     +121     
==========================================
+ Hits        65343    65825     +482     
- Misses      20699    20956     +257     
- Partials     4681     4731      +50

Flag	Coverage Δ
ci	`71.93% <80.84%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...Aevatar.AI.Abstractions/LLMProviders/LLMRequest.cs	`71.92% <100.00%> (+0.50%)`	⬆️
...evatar.AI.Abstractions/LLMProviders/LLMResponse.cs	`100.00% <100.00%> (ø)`
src/Aevatar.AI.Core/Chat/ChatHistory.cs	`81.91% <100.00%> (+0.59%)`	⬆️
...atar.AI.LLMProviders.Tornado/TornadoLLMProvider.cs	`61.19% <100.00%> (+13.19%)`	⬆️
...evatar.AI.ToolProviders.Lark/ILarkCardKitClient.cs	`100.00% <100.00%> (ø)`
....ToolProviders.Lark/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs	`69.50% <100.00%> (-0.20%)`	⬇️
...Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs	`100.00% <100.00%> (ø)`
src/Aevatar.AI.ToolProviders.Ornn/OrnnOptions.cs	`100.00% <100.00%> (ø)`
....ToolProviders.Ornn/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
... and 19 more

... and 42 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

DeepSeek v4-pro with thinking mode rejects requests when reasoning_content from prior assistant turns is not echoed back. This change: - Adds ReasoningContent to ChatMessage, LLMResponse, and LLMStreamChunk - Propagates reasoning content through ChatRuntime streaming rounds - Appends reasoning_content to conversation history for multi-turn - Implements ExtractReasoningContent in MEAILLMProvider - Wires reasoning content into non-streaming ConvertResponse Closes #563

Channel conversation LLM runs inside Orleans actors with no HTTP auth context. The scope resolver returns null, causing ActorBackedUserMemoryStore to throw InvalidOperationException on every turn and log a warning. - Add TryResolveScopeId/TryResolveWriteActorId that return null instead of throwing - ReadProjectedStateAsync returns null when no scope is available - GetAsync returns UserMemoryDocument.Empty, BuildPromptSectionAsync returns empty string — no warning logged - Write operations (Save/Add/Remove) still throw since they only run from Studio API with auth context Closes #564

…ths, history persistence

…, remove dead try-catch

Gracefully handle missing scope in UserMemoryStore (fixes #564)

…ing-content Propagate reasoning_content through LLM pipeline (fixes #563)

eanzhao · 2026-05-05T07:38:57Z

+                "/model encountered NyxID-side binding rejection ({Reason}) but IProjectionReadinessPort is not registered; cannot verify local readmodel cleanup. actor={ActorId}, subject={Platform}:{Tenant}:{User}",
+                reason,
+                actorId,
+                context.Subject.Platform, context.Subject.Tenant, context.Subject.ExternalUserId);


[Consensus: 3 models] severity=blocker, category=arch

The new TryDispatchLocalBindingRevokeAsync runs EnsureProjectionForActorAsync + WaitForBindingStateAsync inside the slash-command request path. Three reviewers flag this as a CLAUDE.md violation: query-time projection priming is forbidden — the application/query layer must not activate a projection scope and synchronously wait for read-model materialization on the request hot path. IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Refactor self-heal so the revocation is fire-and-forget (publish a continuation event, return early to the user with a 'binding cleanup in progress' message), or perform the projection wait off the request path. CLAUDE.md ref: 「禁止 query-time replay/priming」 / 「QueryPort/QueryService/ApplicationService 不得在请求路径...同步补投影」.

Per-model verbatim

codex: This makes the /model read/list path activate a projection scope and then synchronously wait for read-model cleanup. That violates CLAUDE.md: 禁止 query-time replay/priming and the ADR note that IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Move this repair behind an explicit write-side command/port that owns projection activation, and keep /model honest if the cleanup cannot already be observed.

v4-pro: TryDispatchLocalBindingRevokeAsync calls EnsureProjectionForActorAsync + WaitForBindingStateAsync synchronously in the slash-command handler path. This is projection priming in the request path: activating a projection and waiting for the readmodel to refresh within the request execution. Violates CLAUDE.md:

禁止 query-time priming：查询前若需要先'确保投影存在/刷新 read model'，该动作必须在显式 activation、lease、binder 或后台物化流程中完成；禁止在 query 方法内同步补投影。

Move projection activation/binder setup before the slash-command path (e.g. upstream in the conversation pipeline when the inbound message arrives), or make the self-heal fire-and-forget (dispatch revoke + don't synchronously wait for readmodel). The degraded-message fallback already handles cases where the readmodel isn't cleaned within the request.

kimi: TryDispatchLocalBindingRevokeAsync calls _bindingProjectionPort.EnsureProjectionForActorAsync(actorId, ct) synchronously inside the slash command handler request path. This is projection priming in the application layer. CLAUDE.md forbids query-time priming: "查询前若需要先'确保投影存在/刷新 read model'，该动作必须在显式 activation、lease、binder 或后台物化流程中完成；禁止在 query 方法内同步补投影。" While the OAuth endpoint does similar priming, endpoints are Host layer; a per-user-message slash handler is not. Consider activating binding projection scopes eagerly at startup (e.g. a hosted service that primes all known binding actors) instead of per-request.

eanzhao · 2026-05-05T07:38:59Z

+                var actor = await _actorRuntime
+                    .CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct)
+                    .ConfigureAwait(false);
+                var envelope = new EventEnvelope


[Consensus: 4 models] severity=blocker, category=arch

actor.HandleEventAsync(envelope, ct) dispatches RevokeBindingCommand directly through IActorRuntime, bypassing IActorDispatchPort and the actor's inbox / serialization guarantees. Four reviewers flag this as a CLAUDE.md violation of Runtime 与 Dispatch 分责: IActorRuntime may resolve lifecycle/topology, but command delivery must go through the runtime-neutral publisher (publish/send). Inline HandleEventAsync reduces to direct method invocation on the target actor — exactly the pattern CLAUDE.md forbids: 「禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象」. Replace with the runtime-neutral dispatch port (IActorDispatchPort.PublishAsync(...) or the equivalent) and let the actor's inbox process the revoke command.

Per-model verbatim

codex: Calling actor.HandleEventAsync directly bypasses IActorDispatchPort and manually inlines delivery into the target actor. This violates CLAUDE.md: Runtime 与 Dispatch 分责 / 投递语义必须 runtime-neutral; IActorRuntime should only resolve lifecycle/topology, while command delivery should go through IActorDispatchPort.DispatchAsync with EnvelopeRouteSemantics.CreateDirect(...) or the existing command-dispatch helper.

v4-pro: actor.HandleEventAsync(envelope, ct) dispatches the RevokeBindingCommand directly to the actor via inline method call, bypassing the actor's inbox queue and serialization guarantees. Violates CLAUDE.md:

投递语义必须 runtime-neutral：publish/send 统一表示'进入目标 actor inbox 等待处理'，不得因目标是 self 或底层 runtime 差异而退化为 inline dispatch。

The comment at L126 says this mirrors UnbindChannelSlashCommandHandler — but that handler also uses the same pattern and should be fixed too. Use IActorDispatchPort (or equivalent standard dispatch mechanism) instead of CreateAsync + HandleEventAsync.

kimi: The handler directly calls actor.HandleEventAsync(envelope, ct) via IActorRuntime, bypassing the standard IActorDispatchPort. CLAUDE.md requires runtime-neutral dispatch: "禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象。" While UnbindChannelSlashCommandHandler uses the same pattern, the new self-heal path amplifies the violation by coupling it with projection priming and retry logic. Refactor to use IActorDispatchPort or a dedicated binding command port so the slash handler does not depend on runtime internals.

mimo-v2.5-pro: The slash-command handler now directly orchestrates actor dispatch via IActorRuntime, EventEnvelope, and RevokeBindingCommand. Per CLAUDE.md, slash handlers are channel-routed application services — they should not construct and dispatch actor envelopes. Extract SelfHealRevokedBindingAsync / TryDispatchLocalBindingRevokeAsync into a dedicated IBindingSelfHealService (application layer) and inject that instead. This also simplifies testing: the current test file has to mock IActorRuntime, ExternalIdentityBindingProjectionPort, AND IProjectionReadinessPort just to cover one catch branch.

eanzhao · 2026-05-05T07:39:00Z

    private readonly ILogger<ModelChannelSlashCommandHandler> _logger;

    public ModelChannelSlashCommandHandler(
        ILogger<ModelChannelSlashCommandHandler> logger,


[Consensus: 2 models] severity=major, category=di

Constructor injects the concrete ExternalIdentityBindingProjectionPort (and accepts nullable IActorRuntime + IProjectionReadinessPort). Two issues: (a) DI: CLAUDE.md 依赖反转 requires depending on abstractions — define IExternalIdentityBindingProjectionPort and inject the interface (current registration in IdentityServiceCollectionExtensions.cs is services.TryAddSingleton<ExternalIdentityBindingProjectionPort>() against the concrete type). (b) silent degradation: nullable self-heal collaborators mean if any registration is missing in production DI the self-heal — the user-facing fix this PR delivers — silently disappears with no error. Either make these required (non-nullable), or add a startup validation that logs a warning when self-heal will be inactive.

Per-model verbatim

glm-5.1: IActorRuntime, ExternalIdentityBindingProjectionPort, and IProjectionReadinessPort are all nullable and degrade gracefully, but the self-heal feature is the primary user-facing fix in this PR. If any of these aren't registered in the production DI container, the handler silently falls back to the degraded message and the binding loop is unbroken. Verify registrations exist in the host composition root — if they're not yet wired, this should be a blocker.

kimi: Constructor injects the concrete ExternalIdentityBindingProjectionPort instead of an abstraction. CLAUDE.md: "依赖反转：上层依赖抽象，禁止跨层反向依赖和对具体实现的直接耦合。" The port is registered as a singleton concrete class (IdentityServiceCollectionExtensions.cs:86, services.TryAddSingleton<ExternalIdentityBindingProjectionPort>()). Define IExternalIdentityBindingProjectionPort, register against it, and inject the interface.

eanzhao · 2026-05-05T07:39:01Z


    public async Task<bool> RemoveEntryAsync(string id, CancellationToken ct = default)
    {
+        var actor = await EnsureWriteActorAsync(ct);


[Consensus: 3 models] severity=major, category=concurrency

RemoveEntryAsync was reordered so EnsureWriteActorAsync runs before ReadProjectedStateAsync. Three reviewers flag this side-effect change: previously, calling RemoveEntryAsync("missing") was a no-op that returned false without materializing any actor; now it always activates the user-memory actor first (state hydration, reminder registration, storage I/O), even when the entry does not exist (returns false at the existence check). This violates CLAUDE.md 默认路径须定义资源语义 — a no-op delete now implicitly creates a resource, which has cost (cluster activation, cache pressure) and changes resource lifecycle. Move actor resolution after the existence check; only resolve the scope/actor when there is real work to do.

Per-model verbatim

codex: RemoveEntryAsync("missing") now creates or materializes the user-memory actor before checking whether the entry exists. That turns a no-op delete into implicit resource creation, violating CLAUDE.md: 默认路径须定义资源语义; keep the projected read first, and if the no-scope case must throw, resolve the scope ID explicitly without calling EnsureWriteActorAsync until a matching entry is found.

kimi: RemoveEntryAsync was reordered to call EnsureWriteActorAsync before ReadProjectedStateAsync. This causes unnecessary actor activation when the entry does not exist. Actor creation has side effects (state hydration, reminder registration, storage reads). Move actor resolution back after the existence check, or document why eager activation is required.

mimo-v2.5-pro: The reorder of EnsureWriteActorAsync before ReadProjectedStateAsync in RemoveEntryAsync changes the side-effect ordering. Previously the actor was only created when the entry existed. Now every RemoveEntryAsync call creates the actor even when the entry is missing (returns false at line 133). The actor creation is idempotent, but it's an unnecessary round-trip in the common "entry not found" case. Consider moving EnsureWriteActorAsync back after the existence check, or documenting that the actor must exist before the read (if there's a race-condition fix reason for the reorder).

eanzhao · 2026-05-05T07:39:03Z

    }

    public async Task<string> BuildPromptSectionAsync(int maxChars = 2000, CancellationToken ct = default)
    {


[codex] severity=major, category=bug

Removing the previous try/catch means a transient projection/read-model failure in BuildPromptSectionAsync will now fail the whole chat/prompt construction path instead of degrading by omitting memory. Restore the best-effort behavior here: catch non-cancellation exceptions from GetAsync, log the warning, and return string.Empty.

eanzhao · 2026-05-05T07:39:04Z

-        runtime.Actors.Should().NotContainKey("user-memory-user-1",
-            "no actor should be created when entry is missing");
+        var actor = runtime.Actors["user-memory-user-1"];
+        actor.ReceivedEnvelopes.Should().BeEmpty("no remove command should be dispatched when entry is missing");


[v4-pro] severity=major, category=test

RemoveEntryAsync_MissingEntry_ReturnsFalse changed from asserting runtime.Actors.Should().NotContainKey(...) (actor NOT created) to asserting runtime.Actors["user-memory-user-1"] EXISTS with empty envelopes. But the new RemoveEntryAsync reads state first and returns false before calling EnsureWriteActorAsync when the entry is missing — the actor should not be created at all. Either the test is wrong or the test infrastructure (EmptyReader, FakeActorRuntime) triggers an unexpected actor creation. Verify the test passes against the actual code change.

eanzhao · 2026-05-05T07:39:05Z

+            AttachOpenAIRawRepresentationForReasoning(meaiMsg, msg);
            result.Add(meaiMsg);
        }



[Consensus: 3 models] severity=major, category=arch

AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's serialized JSON via rawMessage.Patch.Set("$.reasoning_content"u8, ...) under #pragma warning disable SCME0001. Three reviewers flag the same fragility: Patch is an SDK-internal experimental surface (SCME0001), reasoning_content is not a documented field on AssistantChatMessage, and the JSONPath relies on the SDK's current serialization shape. Any SDK update that changes the patch model, the serializer, or the message structure will silently break or drop reasoning content propagation — there is no compile-time guarantee. Mitigations: (a) pin the OpenAI SDK version, (b) add an integration test that asserts the serialized JSON contains reasoning_content, and (c) file a feature request upstream for first-class reasoning content support so the workaround can be retired.

Per-model verbatim

glm-5.1: rawMessage.Patch.Set("$.reasoning_content"u8, ...) relies on the OpenAI SDK's internal Patch/SCME0001 surface to inject reasoning_content into the serialized JSON. This is fragile — any SDK update that changes the patch model or serialization pipeline will silently drop reasoning content or throw at runtime. Consider isolating this behind a version-checked helper (or a feature flag) so a future SDK bump doesn't cause a silent regression.

v4-pro: AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's AssistantChatMessage via rawMessage.Patch.Set("$.reasoning_content"u8, ...). This hardcoded JSON-path depends on the SDK's internal serialization structure (AssistantChatMessage.ModelReaderWriter.Write output) which could change between versions and silently break reasoning propagation. Consider a custom AssistantChatMessage subclass or an explicit BinaryContent wrapper if the SDK doesn't natively support reasoning_content.

mimo-v2.5-pro: AttachOpenAIRawRepresentationForReasoning uses rawMessage.Patch.Set("$.reasoning_content"u8, ...) with #pragma warning disable SCME0001. This is a fragile internal API — the OpenAI SDK doesn't document reasoning_content as a supported field on AssistantChatMessage, and the Patch property is implementation-detail. If the SDK changes wire format or removes the Patch escape hatch, this will silently break reasoning round-tripping. Add a fallback: if the patch throws (or if RawRepresentation doesn't survive a serialize-then-deserialize cycle), log a warning and degrade gracefully rather than crashing the entire chat call.

eanzhao · 2026-05-05T07:39:07Z

+        CancellationToken ct)
+    {
+        try
+        {


[v4-pro] severity=minor, category=perf

MergeProxyRouteCandidatesAsync calls _nyxClient.DiscoverProxyServicesAsync synchronously within GetServicesAsync. If the /api/v1/proxy/services endpoint is slow or unreachable (and the catch swallows the exception with a warning), the catalog load time doubles on every /model or /route invocation. Consider making the proxy-service discovery happen on a background refresh, or caching with a TTL, rather than per-request.

eanzhao · 2026-05-05T07:39:08Z

+        NyxIdLlmServicesResult result,
+        string bearerToken,
+        CancellationToken ct)
+    {


[mimo-v2.5-pro] severity=minor, category=design

The proxy endpoint path /api/v1/proxy/services?per_page=100 is hardcoded here AND in NyxIdLlmServiceCatalogClient.cs:63. Two independent HTTP clients call the same endpoint. Extract the path into a shared constant (or better, into NyxIdLlmProviderSource / a config key) so the two call-sites stay in sync if the path changes.

eanzhao · 2026-05-08T02:03:23Z

+    /// 注册 Ornn 技能工具系统。ornn_search_skills 工具始终注册到 LLM；远程技能按需获取通过
+    /// IRemoteSkillFetcher 集成到统一的 use_skill 工具。所有 Ornn API 调用通过 NyxID 的
+    /// proxy 路由，因此调用方必须先注册 NyxIdApiClient（一般通过 AddNyxIdTools）。
    /// </summary>


[kimi] severity=minor, category=di

AddOrnnSkills registers OrnnSkillClient which now requires NyxIdApiClient, but the extension does not register or assert that dependency. Hosts that call AddOrnnSkills without also calling AddNyxIdTools will hit a runtime DI resolution failure. Add an explicit services.TryAddSingleton<NyxIdApiClient>() guard here, or document that callers must register NyxID tools first and add a runtime guard that throws a clear InvalidOperationException if the client is missing.

eanzhao · 2026-05-08T02:03:24Z

+        configure?.Invoke(options);
        services.TryAddSingleton(options);
        services.TryAddSingleton<OrnnSkillClient>();
        services.TryAddSingleton<IRemoteSkillFetcher, OrnnRemoteSkillFetcher>();


[v4-pro] severity=minor, category=di

DI contract change: OrnnSkillClient now REQUIRES NyxIdApiClient (was optional HttpClient). services.TryAddSingleton<OrnnSkillClient>() at line 23 will fail at resolution time if NyxIdApiClient is not already registered. The comment at line 12-13 documents the requirement but there's no runtime guard. If EnableOrnnSkills=true but AddNyxIdTools() was not called (unlikely in production, possible in test harnesses), the DI container throws InvalidOperationException. Consider adding a services.AddOrnnSkills guard that checks whether NyxIdApiClient is already registered.

eanzhao · 2026-05-08T02:03:25Z

@@ -401,6 +414,18 @@ private async Task DispatchOutputChunksAsync(
    /// </summary>


[mimo] severity=nit, category=other

The comment references "PR #569 review, codex P1 on SkillRunnerGAgent.cs:351" — the line number will drift as the file evolves. Prefer referencing the commit hash or a stable anchor rather than a line number.

eanzhao · 2026-05-08T02:03:26Z

+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+    <IsPackable>false</IsPackable>
+    <IsTestProject>true</IsTestProject>


[v4-pro] severity=nit, category=test

New test project lacks Aevatar.AI.ToolProviders.NyxId project reference. OrnnSkillClientTests.cs instantiates NyxIdApiClient directly (line: new Aevatar.AI.ToolProviders.NyxId.NyxIdApiClient(...)) via the using statement in OrnnSkillClientTests.cs:4, but the .csproj doesn't have a <ProjectReference> to Aevatar.AI.ToolProviders.NyxId. If the project builds, it's because Aevatar.AI.ToolProviders.Ornn transitively references it (line 11: <ProjectReference Include="..\Aevatar.AI.ToolProviders.NyxId\..." />). This works but is fragile — make the reference explicit.

Codecov flagged 0% coverage on LarkCardKitClient (81 lines) and ILarkCardKitClient (17 lines). Mirror the LarkNyxClient HTTP-recording pattern from LarkCoverageTests so the four CardKit methods exercise the real transport: a fake HttpMessageHandler captures the request, the test asserts URL + method + body shape. Coverage: - CreateCardAsync: POST /cards, body embeds DataJson as a nested JSON object (load-bearing — Lark rejects double-encoded payloads). - StreamElementContentAsync: PUT to .../elements/{element_id}/ content, sequence + uuid wired through. - StreamElementContentAsync omits `uuid` when the IdempotencyKey is blank (Lark rejects empty uuids on some endpoints). - StreamElementContentAsync runs ids through Uri.EscapeDataString (verified via AbsoluteUri so percent-encoded chars survive — the default Uri.ToString() decodes %20 back to a literal space). - SetCardSettingsAsync: PATCH on settings, settings JSON inline. - UpdateCardAsync: PUT card-as-object, sequence carried, no uuid. - ParseJsonObject rejects blank/malformed JSON at the boundary rather than letting it surface as a 400 from Lark. - AddLarkTools registers ILarkCardKitClient as a singleton. 54/54 Aevatar.AI.ToolProviders.Lark.Tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address review batch on PR #562 (10 inline comments). All in files I have recent ownership of and require no architectural shifts: - #16 (blocker, security): ssh_exec is now opt-in via NyxIdToolOptions. EnableSshExecTool. Hosts that haven't wired the approval middleware no longer see the tool by default. Mainnet host opts in (Lark bot needs it). - #21 (major, bug): code_execute keeps the modern /execute + {language, script} contract, but on a NyxID-proxy upstream 404 it retries the legacy /run + {language, code} contract so deployments still pinned to old chrono-sandbox-service builds keep working. - #22 (major, bug): SkillRegistry.IsFresh now exempts SkillSource != Remote from TTL — local skills are baked in at registration and don't need expiring; prior behavior dropped them from use_skill after the first 5min. - #18 (major, bug): TurnRunner.TryResolveSenderBindingAsync narrows the catch to transient infra errors (Http/Timeout/IO/JSON) and surfaces non-transient (logic, NRE, serialization) at Error level so ops can distinguish "sender unbound" from "binding store broken". - #19 (major, bug): ConversationReplyGenerator narrows the sender-route-fallback catch to transient errors via IsRetryableSenderRouteFailure. Programmer errors no longer cost an LLM round on retry. - #29 + #30 (minor): inbox runtime gives metadata enrichment its own 15s budget separate from the LLM run, surfacing errorCode=llm_reply_metadata_timeout when scope/UserConfig lookup is slow. ResolveFallbackTimeout treats ResponseTimeoutSeconds<=0 as "no timeout" rather than silently snapping back to 120s. - #12 (minor): ConversationGAgent's stream-chunk and final-stream-chunk edits run under a 10s CTS now; the failure path already uses one. A hung relay can no longer pin the actor turn forever. - #27 (minor, security): ConstantTimeEquals docstring tightened — removed the "for future callers" line and added a SCOPE comment that this helper is rebuild-admin-only and shouldn't be promoted to internal/public without replacing its length-leak with a length-padding scheme. - #23 (major, bug): CLI ornn skills slug default → ornn-api (matches the registered slug; bare "ornn" is the SPA frontend that returns HTML). Build clean (NyxId / Skills / NyxidChat / Mainnet hosts), 30 AI tests + 15 inbox runtime tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- #15 (major, di): AddOrnnSkills now self-registers NyxIdApiClient and NyxIdToolOptions via TryAdd safety nets so workflow hosts that enable Ornn without AddNyxIdTools get a clean 404 path on missing BaseUrl rather than a confusing DI resolution failure. - #3 (major, di): Introduce IExternalIdentityBindingProjectionPort and inject the interface in IdentityOAuthEndpoints. The concrete class remains for runtime composition; the registration also publishes the interface as a forwarder so existing tests that hold the concrete type keep working. - #14 + #17 (major, concurrency): TurnStreamingReplySink TOCTOU concern was based on a misread — drainSignal is captured inside the same lock acquisition as _dispatchInProgress=false in the cap branch, and the throttle gate's nextIsFinal=false invariant makes _drainTcs guaranteed null on that path. Document the invariants so future readers don't re-flag this. - #20 (major, arch): Replace the singleton Dictionary cache in NyxIdLlmServiceCatalogClient with IMemoryCache. Per CLAUDE.md "中间层状态约束", per-caller state lives in a host-owned cache, not a service field. AbsoluteExpiration policy preserved (30s). - #9 (minor): /api/v1/proxy/services?per_page=100 was already extracted into NyxIdLlmCatalogRoutes.ProxyServicesPath — both call sites already use the constant. - #10 (minor): LooksLikeLlmRouteCandidate already has negative-signal filtering plus a two-distinct-weak-signals threshold. No change. - #11 (minor): ExternalIdentityBindingProjector already logs a Warning with DocumentId/EventId/Version on the empty-binding delete path. Build clean (Ornn / Identity / NyxidChat / Channel.Runtime), updated catalog client test passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- #13 (major, arch): /api/oauth/aevatar-client/rebuild now dispatches ProvisionAevatarOAuthClientCommand via IActorDispatchPort.DispatchAsync, matching /unbind. The inline actor.HandleEventAsync was a known CLAUDE.md "投递语义必须 runtime-neutral" violation; aligning the two endpoints removes the inconsistency that any future inbox middleware would silently bypass on rebuild. - #24 (minor, design): callback endpoint accepts ?format=json on the URL to opt back into the {status:"bound", already_bound, display_name} envelope that programmatic CLI/SDK consumers used pre-HTML-render. Default stays HTML for browser callbacks. - #26 (minor, arch): /rebuild now sits behind a RebuildAuthEndpointFilter that enforces the admin-token check before model binding and per-request DI activation kick in. The filter + the inline check in the handler are redundant by design (defense in depth) — the filter rejects unauth posts before deserialization runs, and the handler still validates so hand-rolled tests/integration scenarios cannot bypass. - #28 (minor, design): document the readmodel-deletion contract in the ExternalIdentityBindingProjector header — empty BindingId deletes the document instead of upserting an inactive record; downstream audit consumers must read the committed-event log directly. - #1 + #2 (blocker, arch): no change needed. Earlier commits in this PR already moved /model self-heal to IActorDispatchPort.DispatchAsync and removed the EnsureProjectionForActorAsync call from the slash-command request path. Verified by reading the current handler. - #25 (minor, test): documented in the rebuild handler comment — concurrent /rebuild calls would race on the same actor, but this is operator-grade break-glass and de-duping concurrent rebuilds is out of scope. Build clean (Identity), 34 OAuth-path tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- #7 (major, arch): wrap MEAILLMProvider's Patch.Set("$.reasoning_content") in try/catch so a future OpenAI SDK contract break (Patch surface change, serializer rewrite, field rename) degrades to "no reasoning replay" instead of crashing the chat call. The OpenAI package is already pinned to 2.9.1 in Directory.Packages.props, and the existing AIComponentCoverageTests already pin the serialized JSON shape so any drift fails the build the moment Patch stops landing in the payload. - Test fix for #22 (skill registry TTL): the old TryGet_BeyondTtl_ReturnsFalseSoCallerCanRefetch test relied on TTL expiring SkillSource.Local entries — that's the bug #22 flagged. Updated the stale-entry tests to use SkillSource.Remote (via remoteId) which is the realistic stale scenario. Added a new TryGet_LocalSkillBeyondTtl_StillFresh test pinning the new behaviour. 538 AI tests + 897 ChannelRuntime tests + 16 Ornn tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codecov flagged 1 missing + 2 partial branches after the previous coverage commit (4c537ad). Add three targeted tests: * CreateCardAsync_RejectsLiteralNullJson: ParseJsonObject's `?? throw` branch only fires when JsonNode.Parse returns null without throwing — and the only input that does that is the literal JSON `"null"`. Without the throw, the request would serialize as `"data": null` and Lark would reject it as a missing field; the boundary check turns the silent null into a clear ArgumentException at the caller's call site. * SetCardSettingsAsync_OmitsUuid_WhenIdempotencyKeyIsBlank: covers the false branch of `if (!string.IsNullOrWhiteSpace(...))` for the settings endpoint. Mirrors the existing StreamElementContent variant. * UpdateCardAsync_PassesIdempotencyKey_WhenProvided: covers the true branch of the same conditional for UpdateCardAsync, plus asserts the idempotency key is .Trim()'d before emission so accidental whitespace does not defeat Lark's dedup. 13/13 LarkCardKitClient tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Multi-model fix-check verdict (3/5 partial, 2/5 unresolved on discussion_r3201406794) flagged the residual structural risk: the runtime-only `card_mode` boolean still lived on LlmReplyStreamChunkEvent — a domain-event-shaped envelope — so a misbehaving persistence layer could replay it and silently re-route to the card sink. Comment-only mitigation was acceptable to three reviewers but not to glm-5.1 / codex. Eliminate the risk by construction: * New proto LlmReplyCardStreamChunkEvent in conversation_events.proto carries the same fields as LlmReplyStreamChunkEvent, dispatched by the card sink path. The proto type itself signals routing — there is no boolean to flip. * LlmReplyStreamChunkEvent.card_mode is removed and the field number is `reserved` along with its name so any stale serializer or accidental reuse fails loudly instead of silently re-enabling card mode. * TurnStreamingReplySink.DispatchOneAsync now constructs the correct proto type based on the existing `cardMode` constructor flag. * ConversationGAgent grows a second [EventHandler] HandleLlmReplyCardStreamChunkAsync. The legacy HandleLlmReplyStreamChunkAsync delegates to a private HandleNyxRelayStreamingChunkCoreAsync helper; the card handler runs the card-streaming machine and, on CreationFailed, synthesizes an edit-message chunk and falls through to the same helper so the user still gets a reply when card creation fails. * IConversationCardTurnRunner / ChannelCardConversationTurnRunner / HandleLarkCardStreamingChunkCoreAsync take LlmReplyCardStreamChunkEvent. Tests update RecordingCardTurnRunner factories and CreateCardStreamChunk helper accordingly; tests now exercise HandleLlmReplyCardStreamChunkAsync instead of relying on a CardMode flag on the legacy event. Spot-checks for the two manual-eyeball items in the verdict: * `git grep ConfigureAwait agents/Aevatar.GAgents.Channel.Runtime/ Conversation/ConversationGAgent*.cs` finds only a comment mention ("no ConfigureAwait(false)") — no actual usage. mimo's dissent on this point was a misread. * The three classifier helpers (ClassifyCreateFailure / ClassifyPostSendFailure / ClassifyStreamFailure) consistently map 11310 → IsTableLimitExceeded and 230099 / 230100 → IsCardUnavailable. No 230099 lands in any IsTableLimitExceeded path. codex's "neither classifier sets it" read is the accurate one. Tests: 133 protocol + 36 ChannelRuntime streaming + 57 Lark tools, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eanzhao · 2026-05-08T04:52:16Z

🔁 `/opencode-pr-fix-check` verdict

Re-ran the same 5 reviewers (deepseek-v4-pro · glm-5.1 · mimo-v2.5-pro · kimi · codex; gemini skipped — needs interactive auth) against HEAD after the fix commits. Verdict per inline comment below.

Score: 31 ✅ resolved · 1 ⚠️ partial · 5 ❌ unresolved · 0 ❓ cannot_determine (out of 37 top-level inline comments)

Tiebreak is conservative: unresolved > partial > resolved > cannot_determine. Where reviewers disagreed I keep the more pessimistic verdict and surface the dissent below.

Items still needing attention (6)

❌ `unresolved` (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:303 (dissent: `v4-pro=resolved`, `glm-5.1=resolved`, `mimo=partial`)

Why still flagged: The _chunksEmitted >= _maxInterimChunks exit path in DispatchLoopAsync still sets _dispatchInProgress = false, captures drainSignal = _drainTcs, clears _drainTcs = null, and breaks. If FinalizeAsync races in after the lock is released, the newly created _drainTcs is never signalled. The reviewer requested capturing drainSignal and clearing _dispatchInProgress atomically while ensuring _pendingText is drained before exit, which was not implemented.

Per-reviewer reasoning

v4-pro → resolved (high): The maxInterimChunks gate now captures drainSignal = _drainTcs and nulls _drainTcs atomically inside the same lock acquisition, eliminating the TOCTOU race. The invariant (nextIsFinal is false when reaching the cap) is documented in the comment.
glm-5.1 → resolved (medium): The dispatch loop was restructured: the maxInterimChunks gate captures drainSignal atomically under lock, and the throttle gate uses armTimerDelay to arm the timer outside the lock. Concurrent OnDeltaAsync correctly picks up pending text via the !_dispatchInProgress path.
mimo → partial (medium): The _chunksEmitted >= _maxInterimChunks path still sets _dispatchInProgress = false and captures drainSignal (_drainTcs) under the lock, then sets _drainTcs = null immediately after. A concurrent FinalizeAsync between these two lines could still set a new _drainTcs that the exiting loop doesn't see. The throttle gate (line 313 area) has a similar TOCTOU where _dispatchInProgress is cleared before the deferred timer is armed. Both were flagged as remaining concurrency concerns.
kimi → unresolved (high): The _chunksEmitted >= _maxInterimChunks exit path in DispatchLoopAsync still sets _dispatchInProgress = false, captures drainSignal = _drainTcs, clears _drainTcs = null, and breaks. If FinalizeAsync races in after the lock is released, the newly created _drainTcs is never signalled. The reviewer requested capturing drainSignal and clearing _dispatchInProgress atomically while ensuring _pendingText is drained before exit, which was not implemented.
codex → unresolved (high): The maxInterimChunks early-exit path still sets _dispatchInProgress = false, captures _drainTcs, clears it, and breaks while pending text remains. The race described in the comment is not structurally fixed; the current code mainly adds explanatory comments.

❌ `unresolved` (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:319 (dissent: `v4-pro=partial`, `glm-5.1=resolved`, `mimo=partial`)

Why still flagged: The throttle gate at the equivalent of line 313 still sets _dispatchInProgress = false inside the lock while _pendingText is not consumed, then breaks. Between this lock release and the timer-arm lock below, a concurrent OnDeltaAsync can overwrite _pendingText and start a new dispatch loop before the deferred timer is armed. The reviewer specifically requested not releasing _dispatchInProgress until after the timer is armed, which was not done.

Per-reviewer reasoning

v4-pro → partial (medium): The TOCTOU gap in the throttle gate (dispatchInProgress set to false before timer arm) still exists in principle. However, data loss is mitigated because OnDeltaAsync always passes the full accumulated text (a superset of any previously-stashed partial), so overwriting _pendingText with a newer accumulated value preserves correctness. No explicit lock refactoring was done for this gate.
glm-5.1 → resolved (medium): Same restructuring as comment 3205773725. The throttle gate's _dispatchInProgress = false and armTimerDelay approach means concurrent OnDeltaAsync will start its own dispatch loop to pick up stashed text, preventing loss.
mimo → partial (medium): The throttle gate sets _dispatchInProgress = false inside the lock but arms the timer outside the lock. Between the lock release and the timer arm, a concurrent OnDeltaAsync → FlushAsync can overwrite _pendingText because _dispatchInProgress is already false. This is the exact TOCTOU the comment flagged at line 313. The fix mentioned in the comment ('don't release _dispatchInProgress until after the timer is armed' or 'record deferred text into a separate field') was not applied.
kimi → unresolved (high): The throttle gate at the equivalent of line 313 still sets _dispatchInProgress = false inside the lock while _pendingText is not consumed, then breaks. Between this lock release and the timer-arm lock below, a concurrent OnDeltaAsync can overwrite _pendingText and start a new dispatch loop before the deferred timer is armed. The reviewer specifically requested not releasing _dispatchInProgress until after the timer is armed, which was not done.
codex → unresolved (high): The throttle gate still releases _dispatchInProgress while _pendingText is not consumed and arms the timer after leaving the lock. There is no separate deferred text field or atomic timer-arm handoff to eliminate the TOCTOU window.

❌ `unresolved` (3/5) — agents/Aevatar.GAgents.NyxidChat/ConversationReplyGenerator.cs:120 (dissent: `v4-pro=partial`, `kimi=resolved`)

Why still flagged: No warning log was added when IRemoteSkillFetcher is null but SkillRegistry is not null. The concern was minor and the system prompt still references use_skill.

Per-reviewer reasoning

v4-pro → partial (medium): The suggestion to add a construction-time warning when _remoteSkillFetcher is null but _skillRegistry is not null was not implemented. The logging suggestion is minor and the system functions correctly either way, but the suggestion was not acted upon.
glm-5.1 → unresolved (high): No warning log was added when IRemoteSkillFetcher is null but SkillRegistry is not null. The concern was minor and the system prompt still references use_skill.
mimo → unresolved (medium): The constructor still silently accepts null for IRemoteSkillFetcher without logging. When _skillRegistry is not null but _remoteSkillFetcher is null, the use_skill tool will be registered (from the skill registry) but cannot fetch remote skills — a confusing silent degradation. No warning log was added.
kimi → resolved (medium): The null fetcher case is handled gracefully: BuildTurnToolsAsync registers UseSkillTool with a fallback new SkillRegistry() when _remoteSkillFetcher is null. While no construction-time warning was added, the functional gap is closed because the system no longer fails silently when the fetcher is missing.
codex → unresolved (medium): NyxIdConversationReplyGenerator still accepts nullable IRemoteSkillFetcher and does not log at construction when it is missing while a SkillRegistry exists. The requested warning is not present in the diff.

❌ `unresolved` (4/5) — agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs:414 (dissent: `v4-pro=partial`)

Why still flagged: The comment still references 'SkillRunnerGAgent.cs:351' (a line number that will drift). No change was made.

Per-reviewer reasoning

v4-pro → partial (medium): The comment still references 'PR fix(skill-runner): treat zero-tool-call runs as failure for fetch-and-summarize skills (#439) #569 review, codex P1 on SkillRunnerGAgent.cs:351' with a line number that will drift as the file evolves. The suggestion to use a commit hash or stable anchor was not implemented. Nit-level concern.
glm-5.1 → unresolved (high): The comment still references 'SkillRunnerGAgent.cs:351' (a line number that will drift). No change was made.
mimo → unresolved (low): The comment is a nit about referencing a line number in a comment that will drift. The line-number reference (SkillRunnerGAgent.cs:351) is still present in the current code. This is a minor documentation nit with no functional impact.
kimi → unresolved (high): The comment in SkillRunnerGAgent.cs still references 'PR fix(skill-runner): treat zero-tool-call runs as failure for fetch-and-summarize skills (#439) #569 review, codex P1 on SkillRunnerGAgent.cs:351'. The line number was not replaced with a commit hash or stable anchor, so it will drift as the file evolves.
codex → unresolved (high): The SkillRunnerGAgent comment still references a drifting source line number: 'codex P1 on SkillRunnerGAgent.cs:351'. No stable anchor or commit reference was substituted.

❌ `unresolved` (2/5) — test/Aevatar.AI.ToolProviders.Ornn.Tests/Aevatar.AI.ToolProviders.Ornn.Tests.csproj:7 (dissent: `v4-pro=partial`, `mimo=resolved`, `kimi=partial`)

Why still flagged: The test .csproj still only has a transitive reference to Aevatar.AI.ToolProviders.NyxId through Aevatar.AI.ToolProviders.Ornn. It builds correctly but the reference is fragile.

Per-reviewer reasoning

v4-pro → partial (medium): The test project still relies on transitive dependency (Aevatar.AI.ToolProviders.Ornn references NyxId, which the test project references). No explicit to Aevatar.AI.ToolProviders.NyxId was added. The project builds correctly via transitivity but the reference is not explicit. Nit-level concern.
glm-5.1 → unresolved (medium): The test .csproj still only has a transitive reference to Aevatar.AI.ToolProviders.NyxId through Aevatar.AI.ToolProviders.Ornn. It builds correctly but the reference is fragile.
mimo → resolved (medium): The test project builds successfully because Aevatar.AI.ToolProviders.Ornn transitively references Aevatar.AI.ToolProviders.NyxId. While making the reference explicit would be more robust, the comment's concern (that the project wouldn't build) is not valid — transitive references are a supported .NET feature.
kimi → partial (medium): The test project builds because Aevatar.AI.ToolProviders.Ornn.csproj transitively references Aevatar.AI.ToolProviders.NyxId. However, the reviewer requested making the reference explicit in the test .csproj for robustness, which was not done.
codex → unresolved (high): The new Ornn test project still only references Aevatar.AI.ToolProviders.Ornn. It does not explicitly reference Aevatar.AI.ToolProviders.NyxId despite tests instantiating NyxIdApiClient directly.

⚠️ `partial` (5/5) — src/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs:282

Why still flagged: A try/catch around the Patch.Set call provides graceful degradation on SDK break, and integration tests (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) pin serialization output. However, the underlying fragility of depending on an undocumented SDK field (SCME0001) remains, and no upstream feature request is visible in the diff.

Per-reviewer reasoning

v4-pro → partial (high): A try/catch around the Patch.Set call provides graceful degradation on SDK break, and integration tests (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) pin serialization output. However, the underlying fragility of depending on an undocumented SDK field (SCME0001) remains, and no upstream feature request is visible in the diff.
glm-5.1 → partial (medium): An integration test (MEAILLMProvider_ConvertMessages_ShouldIncludeReasoningContent) now asserts the serialized JSON contains reasoning_content. However, the Patch.Set approach with SCME0001 pragma remains unchanged, and there's no evidence of an upstream feature request or SDK version pinning from the diff alone.
mimo → partial (high): The try/catch fallback was added so an SDK break degrades to 'no reasoning replay' instead of crashing. However, the underlying fragility of rawMessage.Patch.Set("$.reasoning_content"u8, ...) using the undocumented SCME0001 surface remains — there is no compile-time guard or integration test that asserts the serialized JSON contains reasoning_content after an SDK bump. The comment asked for (b) an integration test asserting the JSON shape, which is not present in this diff.
kimi → partial (medium): Integration tests were added in AIComponentCoverageTests.cs that assert reasoning_content survives serialization via ModelReaderWriter. The code also wraps the Patch call in try/catch for graceful degradation. However, it still relies on the experimental SCME0001 Patch surface with no visible OpenAI SDK version pinning or upstream feature request to retire the workaround.
codex → partial (medium): The MEAI workaround is now isolated with try/catch and tests assert serialized OpenAI JSON contains reasoning_content. The diff does not show an SDK version pin or tracked upstream feature request, so the full mitigation set requested by the comment is not evident.

Resolved threads not listed above. Raw verdicts and reviewer outputs in $WORK/ for re-runs.

Generated by /opencode-pr-fix-check. The skill does not auto-resolve review threads — please mark resolved manually as appropriate.

…-phase-machine Lark streaming: phase machine refactor (#405) + CardKit streaming (#589)

Resolve the items the /opencode-pr-fix-check verdict flagged after the first review pass: - TurnStreamingReplySink #14 (cap branch): clear `_pendingText` when the cap is hit so the reviewer-asked invariant "pending text is never left behind when we release `_dispatchInProgress`" actually holds. The earlier "preserve _pendingText for FinalizeAsync" comment was wrong: FinalizeAsync uses its own `text` parameter, not the stash, so clearing is harmless. - TurnStreamingReplySink #17 (throttle gate): arm the deferred-flush timer inside the same `lock` acquisition that releases `_dispatchInProgress = false`. A concurrent OnDeltaAsync can no longer observe the (no-timer + not-dispatching) gap. The trailing outside-lock `armTimerDelay` block is gone. - ConversationReplyGenerator: log a Warning at construction when SkillRegistry is wired but IRemoteSkillFetcher is missing — that config silently advertises use_skill yet can't pull remote skills. Single grepable line for ops. - SkillRunnerGAgent: replace the drifting `SkillRunnerGAgent.cs:351` reference in the streaming-sink comment with the stable anchor `EnsureToolStatusAllowsCompletion`. - Ornn.Tests.csproj: explicit ProjectReference to Aevatar.AI.ToolProviders.NyxId — tests instantiate NyxIdApiClient directly so depending on transitive flow-through from Ornn was fragile. - MEAILLMProvider Patch.Set: documentation upgrade — call out the three layered mitigations (SDK pin in Directory.Packages.props, AIComponentCoverageTests JSON-shape integration assertion, try/catch graceful degradation) and the upstream tracking path (file an openai-dotnet issue + retire the SCME0001 hack when a typed reasoning_content lands). Build clean (Channel.Runtime, NyxidChat, Scheduled, MEAI, Ornn.Tests), 30 streaming-sink tests + 16 Ornn tests + 18 MEAI/coverage tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "safety net" I added in PR #562 review #15 (commit 673809a) was itself the cause of a production outage on 2026-05-08: System.InvalidOperationException: NyxID base URL is not configured. at NyxIdRelayAuthValidator.ResolveDiscoveryUrl() at NyxIdChatEndpoints.HandleRelayWebhookAsync ... `AddAevatarAIFeatures` runs `RegisterOrnnSkills` BEFORE the mainnet host calls `AddNyxIdTools`, so the safety-net `services.TryAddSingleton< NyxIdToolOptions>()` registered an empty default first; the host's real `AddNyxIdTools(o => { o.BaseUrl = ... })` then no-op'd because `TryAddSingleton(options)` saw the empty instance already registered. Every NyxID call (relay JWT validation, channel-relay/reply, proxy) exploded on null BaseUrl, and Lark stopped replying entirely. The right behaviour: hosts that enable Ornn skills MUST call `AddNyxIdTools` themselves. If they don't, DI resolution fails fast at startup with a clear "no service for NyxIdApiClient" — which is exactly the signal that needs to surface, not be papered over with empty placeholders. Updated the docstring with a remarks block so the next person tempted to add this safety net sees the production-incident breadcrumb first. The reviewer concern that triggered #15 (workflow host enables Ornn without NyxId) is real but theoretical for this repo today; the cost of the wrong fix was concrete and severe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ault PR #590 shipped CardKit streaming behind a flag (default false) and the production cluster's ConfigMap doesn't carry an Aevatar:NyxId:Relay override (verified earlier — the deployed appsettings.Distributed.json has no Relay node), so flipping the flag in repo configuration was a no-op. Move the default to true in the C# field initializer so the aevatar Lark bot uses the card path out of the box. Risk: low. PR #590's failure-routing already handles missing CardKit scopes — card.create returns the scope-error / rate-limit envelope, ConversationGAgent.HandleLarkCardStreamingChunkCoreAsync transitions to CreationFailed, and the turn falls through to the legacy edit-message sink for the rest of the chunks. Worst case for a deployment without cardkit:card:read + cardkit:card:write granted is one extra POST per turn before the fallback kicks in. Deployments that explicitly want the legacy path (no Lark bot, or want to skip the card.create round-trip) can opt out via Aevatar:NyxId:Relay:StreamingCardKitEnabled=false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lt test Last commit flipped StreamingCardKitEnabled default to true, which made ProcessAsync_StreamingEnabled_DispatchesChunkEventAndReadyEvent fail because the inbox now stamps card-mode chunks with the structurally distinct LlmReplyCardStreamChunkEvent proto type (PR #590 routing scheme), and the test asserts the legacy LlmReplyStreamChunkEvent descriptor. Two changes: - Existing test sets StreamingCardKitEnabled=false explicitly; comment notes that it deliberately exercises the legacy text-edit chunk shape. - New test ProcessAsync_StreamingEnabledWithDefaultCardMode_Dispatches- CardChunkEvent pins the new default behaviour: omit the flag, observe LlmReplyCardStreamChunkEvent on the wire. 898 ChannelRuntime tests + 94 NyxId AI tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lark CardKit 2.0's open-apis/cardkit/v1/cards endpoints want type-tagged `data` / `settings` / `card.data` payloads as JSON-encoded STRINGS, not inline objects. The merged PR #590 implementation always embedded those fields as inline JsonNode trees, which Lark rejects with code 9499 "Invalid parameter type in json: Data | Settings | Card". Production rollout (commit 36b8b5c flipped StreamingCardKitEnabled default to true) made every Lark turn run card.create, hit the 400, log "Card create failed; falling back to text-edit", and silently spend the rest of the turn on the legacy edit-message path — so users still saw text-edit replies and concluded card mode was off (it was on, but the client was sending the wrong shape). Verified each shape against the live endpoint via `nyxid proxy request`: POST /cards data: STRING when type=card_json|card_id data: INLINE OBJECT when type=template PATCH /cards/{id}/settings settings: STRING PUT /cards/{id} card: INLINE { type, data } where data follows the same string rule Tests: - Updated existing pinning tests that asserted the wrong shape (the inline-object assertion was the contract bug). - Added explicit cases for card_id (string) and template (inline object). - Round-trip card_json string back through JsonDocument.Parse so the test still pins the inner schema/streaming_mode fields. - UpdateCardAsync now validates CardJson parses (defensive client-side) and wraps it in `card.{type:"card_json", data:<string>}` automatically. - All four CardKit methods reject blank / null DataJson upfront. 59 Lark CardKit tests + 37 ChannelRuntime card tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Production incident 2026-05-08: a Lark turn issued an `ssh_exec` tool call to sg-office-network. NyxID accepted the POST to `/api/v1/ssh/{catalog_id}/exec` at 07:17:47 UTC and never returned a response. Direct curl reproduces the call in ~2s (NyxID side duration_ms:1666), so the hang is something between aevatar's HttpClient and NyxID's SSH gateway / NodeAgent — most likely the per-user single- session SSH lock with a stale prior session. With no client-side cap, the call sat past the inbox runtime's 120s timeout, the user got the generic `llm_reply_timeout` fallback ("Sorry, this took too long..."), and the actual SSH attempt was never observable to the model. Two layered fixes: 1. NyxIdSshExecTool now wraps the SshExecAsync call in a linked CTS that cancels at `timeout_secs + 15s` (the +15s gives NyxID's own server- side timeout a margin to respond with `timed_out: true` before we give up). On timeout the tool returns {"error":"ssh_timeout","detail":"NyxID did not return ... within Ns"} so the LLM sees a real tool error and can either retry against a different host or summarize "couldn't reach the SSH gateway" instead of dragging the run. 2. NyxIdApiClient.SendAsync now lets `OperationCanceledException` propagate. Previously the catch-all wrapped it as {"error":true,"message":"A task was canceled."} which silently shadowed any per-call CTS the caller installed on top of the LLM run's CT (the test for this fix caught it: tool's catch never fired). Cancellation is a control-flow signal, not an HTTP error envelope. Other callers of NyxIdApiClient already work with either path because the LLM run's CT cancellation also unwinds the inbox runtime via OperationCanceledException. Tests: - New ExecuteAsync_HardTimesOut_WhenNyxIdHangsOnSshPost exercises the full fail mode via a `MapHanging` route on the test PathHandler that awaits Timeout.Infinite until the cancellation token fires. - Existing 44 SSH + ApiClient tests still pass; 539-test AI suite green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Production observation 2026-05-08: a "拉一下 sg office 设备 IP" turn generated a perfectly-shaped streaming-card reply but ran out of LLM turn budget mid-flush — the user saw the generic "took too long" fallback instead of the real answer. The bot did the work; we just cut it off too early. Multi-step Lark turns realistically need: - ornn skill-search / fetch SKILL.md ~10s - ssh_exec to a remote host ~30-60s (45s is the new client-side hard cap on a stuck NyxID SSH gateway) - LLM thinking + final streaming dispatch ~30-90s - per-tool jitter + connection setup ~10-20s ────── ~ 80-180s typical 120s left no margin once anything but the happy path happened. 300s buys headroom without giving up the watchdog: a true infinite hang still resolves to the "took too long" fallback so the user is not stuck on a "..." reaction forever. Bumped both knobs to keep them in sync: - NyxIdRelayOptions.ResponseTimeoutSeconds default 120 → 300 - ChannelLlmReplyInboxRuntime.FallbackTimeoutSecondsDefault 120 → 300 Deployments that want a tighter / looser cap still set Aevatar:NyxId:Relay:ResponseTimeoutSeconds in config; 0 / negative still means "no cap" (host has its own watchdog) per the existing ResolveFallbackTimeout contract. Existing tests that pin specific timeout values (e.g. ResponseTimeoutSeconds=1 in the timeout-fallback test) are unaffected; tests that didn't set the option get 300s by default which is even further from firing for fast tests. 16 inbox runtime + 24 SSH tool tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /daily Lark slash command and the agent template behind it were named daily_report internally — six different casings (daily_report, DailyReport, dailyReport, daily-report, plus compound identifiers like daily_report_form, BuildDailyReportForm, CreateDailyReportAgentAsync). The user-facing command has always been /daily; the internal _report suffix added no signal and made it harder to talk about the skill in docs and Ornn. Strategy: - Sed-driven global rename across .cs / .json / .md / .proto / .yaml / csproj. 281 occurrences across 26 files. Order: kebab → camel → Pascal → snake (no shared substrings, so order is defensive). - Renamed compound identifiers (daily_report_form → daily_form, BuildDailyReportForm → BuildDailyForm, etc.). All callers updated by the same pass. - Three test inputs that intentionally used /daily_report as a fake unknown-command literal (testing "this is not a real command, return unknown usage") now collide with the real /daily command after the rename. Dropped those three [InlineData] entries — the same code path is still covered by the /foobar and / cases that remain. Persistence note: any scheduled agent records persisted with template="daily_report" before this commit will not match the new template="daily" lookup. The current production /daily flow is broken anyway (always returns eanzhao's data, hangs past the 120s budget per this morning's incident review), so a clean cutover is acceptable — re-run /daily and the rebuilt agent will land under the new name. This commit is naming-only. The next commit (separate) replaces the template body to consume the Ornn-hosted `daily` skill via use_skill, the way sg-office-network is reached via ssh_exec. Build clean across affected projects (Authoring.Lark, Scheduled, NyxidChat). 895 ChannelRuntime + 17 Lark Platform tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`/daily` and `/social-media` are no longer in-tree commands. Recipes for new persistent agents now live as Ornn skills — the LLM searches via `ornn_search_skills` (or matches a slash command name to a skill query) and follows the SKILL.md verbatim. `agent_builder` keeps only the lifecycle surface (`list_agents`, `agent_status`, `run_agent`, `disable_agent`, `enable_agent`, `delete_agent`). The `social_media` template, the `WorkflowAgentGAgent` actor, the `twitter_publish` workflow module, the `IScopeWorkflowCommandPort` LLM plumbing in `agent_builder`, and all related card-flow surfaces are deleted with no replacement; the missing scope-scoped workflow tools needed to bring social-media back as an Ornn skill are tracked in issue #598. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…continuation-phase-a [codex] Refactor LLM reply continuation into agent runs

eanzhao and others added 3 commits May 3, 2026 18:47

Merge pull request #561 from aevatarAI/fix/2026-05-03_binding-and-rep…

00fa7c7

…ly-self-heal Self-heal binding rejections + streaming-failed reply-token loop

eanzhao requested review from jason-aelf and louis4li as code owners May 4, 2026 07:03

eanzhao changed the title ~~Lark bot enhancement~~ [WIP] Lark bot enhancement May 4, 2026

eanzhao and others added 14 commits May 4, 2026 16:58

Fix reasoning_content round-trip: outbound serialization, fallback pa…

2690a61

…ths, history persistence

Make RemoveEntryAsync throw on missing scope, add no-scope read tests…

f1ddc4b

…, remove dead try-catch

Merge pull request #566 from aevatarAI/fix/2026-05-04_user-memory-scope

6505a06

Gracefully handle missing scope in UserMemoryStore (fixes #564)

Add tests for reasoning_content round-trip coverage

99e56e3

Add DSML and hook-blocking tests for reasoning_content coverage

e757a3b

Merge pull request #565 from aevatarAI/fix/2026-05-04_deepseek-reason…

a016c19

…ing-content Propagate reasoning_content through LLM pipeline (fixes #563)

Fix DeepSeek reasoning serialization

58ad94f

Fix stale binding projection repair

dc436a1

Activate binding projection during self heal

41a0a70

Delete stale binding readmodel on revoke

419516c

List proxy LLM route candidates

f65a80d

Restore owner LLM fallback for Lark users

8001adf

eanzhao commented May 5, 2026

View reviewed changes

eanzhao commented May 8, 2026

View reviewed changes

eanzhao and others added 8 commits May 8, 2026 10:03

Improve NyxID SSH exec coverage

af91cc2

eanzhao and others added 17 commits May 8, 2026 13:38

Merge pull request #590 from aevatarAI/feat/2026-05-07_lark-streaming…

d5308b4

…-phase-machine Lark streaming: phase machine refactor (#405) + CardKit streaming (#589)

Merge branch 'dev' into feature/lark-bot

1ff822b

Merge branch 'dev' into feature/lark-bot

e108a6d

Fix NyxID SSH exec test stability

fcd3e1d

Refactor LLM reply continuation into agent runs

f078068

Address agent run review comments

c108ef4

Harden agent run output retry handling

2e4a1d1

Merge pull request #597 from aevatarAI/refactor/2026-05-08_agent-run-…

c2f1969

…continuation-phase-a [codex] Refactor LLM reply continuation into agent runs

		@@ -401,6 +414,18 @@ private async Task DispatchOutputChunksAsync(
		/// </summary>

Conversation

eanzhao commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eanzhao commented May 8, 2026

🔁 /opencode-pr-fix-check verdict

Items still needing attention (6)

❌ unresolved (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:303 (dissent: v4-pro=resolved, glm-5.1=resolved, mimo=partial)

❌ unresolved (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:319 (dissent: v4-pro=partial, glm-5.1=resolved, mimo=partial)

❌ unresolved (3/5) — agents/Aevatar.GAgents.NyxidChat/ConversationReplyGenerator.cs:120 (dissent: v4-pro=partial, kimi=resolved)

❌ unresolved (4/5) — agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs:414 (dissent: v4-pro=partial)

❌ unresolved (2/5) — test/Aevatar.AI.ToolProviders.Ornn.Tests/Aevatar.AI.ToolProviders.Ornn.Tests.csproj:7 (dissent: v4-pro=partial, mimo=resolved, kimi=partial)

⚠️ partial (5/5) — src/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs:282

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 4, 2026 •

edited

Loading

🔁 `/opencode-pr-fix-check` verdict

❌ `unresolved` (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:303 (dissent: `v4-pro=resolved`, `glm-5.1=resolved`, `mimo=partial`)

❌ `unresolved` (2/5) — agents/Aevatar.GAgents.Channel.Runtime/TurnStreamingReplySink.cs:319 (dissent: `v4-pro=partial`, `glm-5.1=resolved`, `mimo=partial`)

❌ `unresolved` (3/5) — agents/Aevatar.GAgents.NyxidChat/ConversationReplyGenerator.cs:120 (dissent: `v4-pro=partial`, `kimi=resolved`)

❌ `unresolved` (4/5) — agents/Aevatar.GAgents.Scheduled/SkillRunnerGAgent.cs:414 (dissent: `v4-pro=partial`)

❌ `unresolved` (2/5) — test/Aevatar.AI.ToolProviders.Ornn.Tests/Aevatar.AI.ToolProviders.Ornn.Tests.csproj:7 (dissent: `v4-pro=partial`, `mimo=resolved`, `kimi=partial`)

⚠️ `partial` (5/5) — src/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs:282