reuse transcript blobs across turn-end checkpoints by Soph · Pull Request #984 · entireio/cli

Soph · 2026-04-19T10:21:56Z

This should address some of the issues reported with codex hitting hook timeouts, for example #957

Summary

Stop hooks were taking ~30 s on 18 MB image-heavy transcripts (reported against Codex, but the code path is agent-agnostic). Two root causes inside finalizeAllTurnCheckpoints:

For each of N turn-checkpoints, the full redacted transcript was re-chunked and re-zlib-compressed, doubled when v1+v2 dual-write is enabled. Pure-Go zlib on ~18 MB of base64 image data is the bottleneck; go-git v6 does not short-circuit before compression when the loose object already exists on disk.
replaceTranscript / updateCommittedFullTranscript unconditionally rewrote all transcript blobs even when the stored content was identical.

Changes

PrecomputedTranscriptBlobs new type on UpdateCommittedOptions: chunk blob hashes + content-hash blob, computed once. Content-addressed hashes are identical for v1 (full.jsonl) and v2 (raw_transcript) paths, so one precompute serves both stores.
PrecomputeTranscriptBlobs(ctx, repo, transcript, agentType) constructor in checkpoint/committed.go.
finalizeAllTurnCheckpoints now calls PrecomputeTranscriptBlobs once above the per-checkpoint loop and threads the result into every UpdateCommittedOptions. On precompute failure the loop falls back to the old per-checkpoint path (best-effort, logged).
v1 replaceTranscript (committed.go): accepts precomputed blobs; short-circuits when existing content_hash.txt matches the new sha256.
v2 updateCommittedFullTranscript (v2_committed.go): same short-circuit against raw_transcript_hash.txt; writeTranscriptBlobs + new writeContentHashFromPrecompute accept precomputed blobs.

Expected effect on a 30 s, 18 MB, 3-checkpoint finalize

Chunk + zlib of redacted transcript: 2N → 1 (v1 + v2, one pass).
Content-hash blob: 2N → 1, dropping to 0 on no-op re-finalize.
Estimated ~10–15 s removed from the reported 30 s case.
compactTranscriptForV2 is genuinely per-checkpoint (per-checkpoint startLine), so it stays in the loop — not addressed here.

Risk notes

Short-circuit only fires on byte-identical transcript — redaction output is deterministic so this is safe.
Fallback path preserved: if PrecomputeTranscriptBlobs fails, every checkpoint re-chunks and re-blobs as before.
No behavior changes visible from the public API (ReadSessionContent etc.) — verified by TestUpdateCommitted_PreservesMetadata + roundtrip tests.
No change to on-disk tree shape, ref layout, or commit-message trailers.

Potential next steps

In decreasing ratio order, not shipped in this PR:

Parse-on-first-use transcript cache. lifecycle.go handleLifecycleTurnEnd runs ExtractPrompts + ExtractAllModifiedFiles + CalculateTokenUsage + parseTranscriptForCheckpointUUID — each parses the same bytes independently. A sync.Once-guarded shared []TranscriptLine would collapse 3–4 parses into 1 without touching public agent interfaces. Estimated ~3–5 s/turn on image-heavy transcripts. Every agent, every turn — not just stop.
Hoist compact.Compact if we can amortize per-startLine. compactTranscriptForV2 is currently per-checkpoint because of startLine. Worth a follow-up to check whether the expensive part (the JSONL walk) can run once with all chunks shared, and only the startLine slice differs.
In-memory-bytes overloads for ExtractPrompts / parseTranscriptForCheckpointUUID. Both currently re-read the same transcript file from disk. ~0.5–1 s/turn. Subsumed by (1) if it lands.
Audit .entire/metadata/<sid>/full.jsonl writers/readers. lifecycle.go:357 writes ~18 MB to disk every turn, but finalize re-reads the original agent transcript, not this copy. Some resume/explain flows may still depend on it — scope before deleting.
Streaming / lazy JSONL parser with json.RawMessage for image content blocks. Biggest win for 18 MB base64-heavy transcripts but largest diff; diminishing returns if (1) lands first.

Test plan

Run mise run check to confirm fmt/lint/test:ci pass on this branch.
Sanity-check on a local repo: start a session, run multiple turn-end commits, git stop, confirm entire/checkpoints/v1 tree and /full/current tree both contain the finalized transcript and content hash.
If possible, reproduce the customer's 18 MB, multi-image, 3-checkpoint scenario and measure stop-hook wall time before/after.

Want me to push the branch and open the PR with this as the body?

Note

Medium Risk
Touches checkpoint persistence logic for both v1 and v2, adding new short-circuit paths based on stored content hashes; mistakes could leave transcripts stale or mismatched across refs. Changes are localized and covered by new roundtrip/short-circuit tests plus a fallback-to-old-behavior path if precompute fails.

Overview
Speeds up turn-end checkpoint finalization by precomputing transcript chunk blobs once and reusing their content-addressed hashes across all UpdateCommitted calls (and across v1+v2 dual-writes).

Adds a PrecomputedTranscriptBlobs option and PrecomputeTranscriptBlobs helper, updates v1 replaceTranscript and v2 /full/current updates to skip re-chunking/re-compressing when the transcript content is unchanged (via existing content-hash files), and threads the precomputed blobs through finalizeAllTurnCheckpoints. Includes new tests covering precompute reuse and the content-hash short-circuit behavior for both v1 and v2.

^{Reviewed by Cursor Bugbot for commit da55218. Configure here.}

Chunk + zlib-compress the session transcript once per stop hook instead of N times (one per mid-turn checkpoint, doubled on v1+v2 dual-write). Adds a content-hash short-circuit so UpdateCommitted skips all blob work when the stored transcript already matches. Motivated by a 30 s stop-hook report with an 18 MB image-heavy transcript: pure-Go zlib of identical content across 3 checkpoints dominated wall time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: 6fd1208c5c6d

Mirrors the v1 UpdateCommitted tests against V2GitStore so the /full/current transcript fast paths are exercised directly instead of only transitively via broader checkpoint tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: b4cdc67eb804

Copilot

Pull request overview

This PR improves stop-hook performance by avoiding repeated transcript chunking/zlib work across multiple turn-end checkpoint finalizations, and by short-circuiting transcript rewrites when content is unchanged. It fits into the checkpoint “finalize” flow by reducing redundant blob creation and reuse across both v1 (entire/checkpoints/v1) and v2 (refs/entire/*) storage paths.

Changes:

Precompute transcript chunk blob hashes + content-hash blob once per turn and reuse across all UpdateCommitted calls in finalizeAllTurnCheckpoints.
Add v1/v2 short-circuiting based on stored content-hash files to skip chunking/blob creation when transcript bytes are identical.
Add tests for precomputed blob reuse and no-op updates (roundtrip + stability checks).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
cmd/entire/cli/strategy/manual_commit_hooks.go	Precomputes transcript blobs once and threads them into per-checkpoint `UpdateCommittedOptions`.
cmd/entire/cli/checkpoint/checkpoint.go	Extends `UpdateCommittedOptions` with `PrecomputedBlobs` and defines `PrecomputedTranscriptBlobs`.
cmd/entire/cli/checkpoint/committed.go	Adds v1 short-circuiting + uses precomputed chunk/content-hash blobs; introduces `PrecomputeTranscriptBlobs`.
cmd/entire/cli/checkpoint/v2_committed.go	Adds v2 short-circuiting and reuses precomputed chunk/content-hash blobs.
cmd/entire/cli/checkpoint/committed_update_test.go	Adds v1 tests around precompute roundtrip and “short-circuit” scenarios.
cmd/entire/cli/checkpoint/v2_precompute_test.go	Adds v2 tests around precompute roundtrip and “short-circuit” scenarios.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit da55218. Configure here.}

- v2 short-circuit now returns nil instead of splicing the tree and advancing the /full/current ref; prevents a no-op commit per identical re-finalize (matches v1 behavior). - finalize skips PrecomputeTranscriptBlobs entirely when the redacted transcript is empty, avoiding a wasted empty-chunk blob. - Short-circuit tests now count calls through a chunkTranscript hook so they actually prove chunking is skipped; previously they only asserted content-addressed blob hashes are equal, which is trivially true. - Added isUsable() invariant check so a malformed PrecomputedTranscriptBlobs falls through to the fresh path instead of writing a zero-hash entry. - Extracted precomputeTranscriptBlobsForFinalize helper to keep finalizeAllTurnCheckpoints under the maintainability threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Entire-Checkpoint: e4aeff0a5613

Soph and others added 2 commits April 19, 2026 11:47

Copilot AI review requested due to automatic review settings April 19, 2026 10:21

Soph requested a review from a team as a code owner April 19, 2026 10:21

Copilot started reviewing on behalf of Soph April 19, 2026 10:22 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

cursor bot reviewed Apr 19, 2026

View reviewed changes

Comment thread cmd/entire/cli/checkpoint/v2_committed.go Outdated

pjbgf approved these changes Apr 20, 2026

View reviewed changes

Soph merged commit 243f446 into main Apr 20, 2026
9 checks passed

Soph deleted the soph/transcript-optimization branch April 20, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reuse transcript blobs across turn-end checkpoints#984

reuse transcript blobs across turn-end checkpoints#984
Soph merged 3 commits intomainfrom
soph/transcript-optimization

Soph commented Apr 19, 2026 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

Soph commented Apr 19, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Expected effect on a 30 s, 18 MB, 3-checkpoint finalize

Risk notes

Potential next steps

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Soph commented Apr 19, 2026 •

edited by cursor bot

Loading