fix(agent-core): compaction edge cases, error handling, and truncated output#120
Conversation
🦋 Changeset detectedLatest commit: 87b503f The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b112bebb31
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| this.begin({ source: 'auto', instruction: undefined }); | ||
| return this.compacting !== null; |
There was a problem hiding this comment.
Skip auto compaction when the suffix cannot fit
When the retained suffix is already larger than the model window (for example an old exchange followed by one huge pending user prompt), starting auto compaction here cannot make the next LLM request fit because the trailing user message is preserved. The turn will compact the prefix, retry with the same oversized suffix, then compact the summary again until maxCompactionPerTurn is hit, leaving the session history rewritten to summaries and failing with a compaction-limit error instead of the original overflow. The previous beginAutoCompaction guarded against history.slice(compactedCount) exceeding max_context_tokens; that check needs to be kept before calling begin.
Useful? React with 👍 / 👎.
| if (!cannotSplitAfter(m1)) { | ||
| bestN = messages.length - recentMessages; | ||
| } |
There was a problem hiding this comment.
Preserve whole parallel tool exchanges during compaction
When history contains a parallel tool call with multiple tool results, this can choose a split after only the first tool message because it only checks the message immediately before the split. For example, assistant(toolCalls=[a,b]), tool(a), tool(b), user(...) can compact the assistant call and tool(a) while leaving tool(b) in the retained suffix, so the next provider request contains an orphan tool result without its preceding assistant tool call. The old sliceCompleteMessages normalization prevented this; the strategy needs to reject splits inside a tool exchange before applying compaction.
Useful? React with 👍 / 👎.
| const compactedCount = this.strategy.computeCompactCount(this.agent.context.history); | ||
| if (compactedCount === 0) { | ||
| throw new KimiError(ErrorCodes.COMPACTION_UNABLE, 'No prefix that can be compacted in current history.'); | ||
| } |
There was a problem hiding this comment.
Avoid validating historical begin records while restoring
For sessions recorded by earlier builds, a manual /compact on an empty or otherwise non-compactable history could still write full_compaction.begin before the worker cancelled. During replay AgentRecords.restore calls begin() with records.restoring set, but this new validation runs before the restoring check and throws compaction.unable, so those existing sessions can no longer resume. Restoration should replay old records without starting work or recomputing compactability, or a migration should remove/normalize those begin/cancel pairs.
Useful? React with 👍 / 👎.
| const response = await this.agent.generate( | ||
| this.agent.config.provider, | ||
| this.agent.config.systemPrompt, | ||
| [...this.agent.tools.loopTools], | ||
| messages, | ||
| undefined, | ||
| { signal }, |
There was a problem hiding this comment.
Apply completion budgets to compaction requests
Compaction still issues an LLM request, but this path now calls the raw provider directly, bypassing the resolveCompletionBudget/applyCompletionBudget wrapper that ordinary turn requests use and that compaction used before. When operators set KIMI_MODEL_MAX_COMPLETION_TOKENS or rely on loop_control.reservedContextSize for Kimi models, manual/auto compaction ignores that cap, so a large compaction prompt can consume an unbounded completion budget or hit provider truncation despite the configured limit. Build an effective provider for this call the same way the turn LLM path does.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f6e8b7f31
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74eaf747af
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let recentSize = 0; | ||
| let bestN: number | undefined; | ||
|
|
||
| for (; recentMessages < messages.length; recentMessages++) { |
There was a problem hiding this comment.
Allow compacting a single completed exchange
When the history contains exactly one completed user/assistant exchange and the user runs manual compaction, this loop never considers N === messages.length because it requires recentMessages < messages.length, and the only candidate split is after the initial user message, which is unsafe. That makes computeCompactCount([user, assistant]) return 0, so /compact/Session.compact() rejects with compaction.unable even though the whole completed exchange can be summarized into a compaction summary with an empty retained suffix. This is a common post-turn compaction case and differs from truly uncompactable histories like only pending user messages.
Useful? React with 👍 / 👎.
| } | ||
| this.agent.fullCompaction.begin({ source: 'auto', instruction: undefined }); | ||
| return true; | ||
| this.begin({ source: 'auto', instruction: undefined }); |
There was a problem hiding this comment.
Keep the overflow suffix check before auto-compacting
When the provider reports context overflow but the retained suffix is already too large by itself (for example, a huge pending user prompt after older history), this now starts auto-compaction unconditionally. The old guard avoided compacting in that case; without it, the worker summarizes older messages, retries with the still-oversized pending prompt, and can repeatedly compact the summary until the per-turn compaction limit is hit, wasting calls and replacing useful history even though compaction cannot make the next request fit.
Useful? React with 👍 / 👎.
| if (compactedCount === 0) { | ||
| throw new KimiError(ErrorCodes.COMPACTION_UNABLE, 'No prefix that can be compacted in current history.'); |
There was a problem hiding this comment.
Do not throw compaction.unable for auto-compaction
When automatic compaction is triggered but the current history has no safe prefix (for example, the first user prompt alone crosses the 85% trigger threshold but still fits the model window), beginAutoCompaction() reaches this throw through begin({ source: 'auto' }). Because beforeStep() does not catch it, the turn fails before the provider is called, so large single prompts or consecutive pending user messages that could have been answered now immediately end with compaction.unable; auto-compaction should return false in this case and let the normal request/overflow path proceed.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 72758c30e2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (canSplitAfter(messages, i)) { | ||
| return i; |
There was a problem hiding this comment.
Return the split count for manual compaction
When /compact finds a safe split at message index i, the count passed to messages.slice(0, N) must be i + 1, not i. With a normal completed exchange like [user, assistant], this returns 1, so the compaction LLM summarizes only the user prompt and applyCompaction preserves the assistant reply outside the summary; longer histories similarly leave the message at the chosen safe split in the retained suffix instead of compacting it.
Useful? React with 👍 / 👎.
| markCompleted() { | ||
| this.agent.records.logRecord({ | ||
| type: 'full_compaction.complete', | ||
| ...result, | ||
| }); |
There was a problem hiding this comment.
Keep complete records compatible with vis renderers
New full_compaction.complete records now contain only type/time, but the visual debugger still renders that record by reading compactedCount, tokensBefore, tokensAfter, and summary.length in apps/vis/web/src/components/wire/WireHeadline.tsx and WireRowDetail.tsx. Opening a newly compacted session in vis will therefore render undefined values or throw on record.summary.length; either keep the result payload on this record or update the vis renderers before emitting empty complete records.
Useful? React with 👍 / 👎.
| if (error instanceof APIContextOverflowError || error instanceof TruncatedError) { | ||
| compactedCount = this.strategy.reduceCompactOnOverflow(messagesToCompact); |
There was a problem hiding this comment.
Stop retrying when overflow cannot reduce the prefix
When the compaction request overflows for a prefix that has no smaller safe split, such as a single very large completed exchange, reduceCompactOnOverflow(messagesToCompact) returns the same messagesToCompact.length. This assignment then retries the exact same oversized compaction prompt until MAX_COMPACTION_RETRY_ATTEMPTS is exhausted, wasting provider calls and reporting a generic compaction failure instead of recognizing that the prefix cannot be reduced; compare the new count to the old one and stop when it does not shrink.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6429171bfd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87b503f2d2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Summary
Reworks the compaction subsystem to fix several edge cases surfaced while testing on long sessions.
Strategy / algorithm
computeCompactCount: removed a stray innerrecentMessages++that made the loop step by 2 and undercountrecentSize. Behavior now matches the rules documented above the function (rule 6: smallest validNwithin recent limits).0when no prefix can be compacted instead of throwing, with a fallback that allows compacting all messages when only the trailing message is splittable (tool / asst-no-tool-call).maxRecentSteps→maxRecentMessagesand the algorithm now respects all three at-most limits.Worker
project()on the messages passed to the compaction LLM call so hook records, blocked prompts, and empty assistant placeholders no longer leak in.extractCompactionSummaryinside the retry loop so empty summaries (APIEmptyResponseError) actually retry.retryCountnow only counts real retries (not the first non-retryable failure).finishReason === 'truncated'), shrinkcompactedCountviareduceCompactOnOverflowand retry, sharing the existing overflow handling path. Subject toMAX_COMPACTION_RETRY_ATTEMPTS.tokensBefore/tokensAfterare now both estimated from the raw history (no projection, no system prompt) so they are comparable.Lifecycle
COMPACTION_UNABLEerror code.begin()throws it when there is nothing to compact, so both manual (RPC rejects) and auto (turn fails) paths produce a clear signal instead of silently erroring or spinning./compacthandler catches the rejection and surfaces it viashowError.Exports
compaction/index.tsre-exports from./strategy(./configwas deleted) soCompactionStrategyand friends remain on the public surface.Test plan
pnpm -w typecheckpnpm -w test(4464 passed | 25 skipped | 2 todo)computeCompactCountedge cases (0 / 1 message, all-user, last-splittable fallback)compaction.unablerejection on both manual and auto paths (agent-core + node-sdk)/compacton an empty session, verify[compaction.unable]shown in TUI