[DO NOT MERGE] Persistent sub-agents and tools by snimu · Pull Request #97 · PrimeIntellect-ai/rlm-harness

snimu · 2026-06-11T16:06:16Z

Create persistent sub-agents and tools. For more info see the design doc.

Note

High Risk
Large architectural change to the agent loop, session persistence, kernel lifecycle, and concurrency limits; mistakes could leak kernels, corrupt transcripts, or deadlock rollouts under caps.

Overview
Adds a background sub-agent and programmatic-tool layer so IPython cells can send work and poll handles across later cells, including named multi-turn rlm specialists.

RLMEngine is split into setup() / advance() / aclose() so one conversation can advance repeatedly and rehydrate from disk after kernel loss. messages.jsonl becomes a typed, view-based append-only transcript with load_latest_view() recovery (torn tails, interrupted compaction). Resume paths answer dangling tool calls, persist metrics via RLMMetrics.snapshot/restore, and optionally warn when the REPL was reset.

async_runtime provides workers, handles, ToolState, and a per-kernel registry; api.send wires a stateful _RlmProcessor with nested sub-<name>/ sessions. Skills get ephemeral .send via kernel_bootstrap. agent_limit enforces total and running caps with marker files + flock.

config.py centralizes RLM_* settings; teardown cascade-drains child agents with a sentinel check. System prompts document rlm.send / handle.poll() and skill backgrounding. Broad test coverage for runtime, caps, persistence, resume, and sessions.

^{Reviewed by Cursor Bugbot for commit d99c70b. Bugbot is set up for automated code reviews on this repo. Configure here.}

Phase 1 (build now): a unified send/poll background primitive for all programmatic callables, named persistent multi-turn rlm sub-agents, and token budgets. Phase 2 (deferred): interruptions overview. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

IPythonREPL._execute_locked captured stream / execute_result / error messages but dropped display_data and update_display_data, so output from display() and rich reprs a normal IPython user sees was invisible to the model. Capture their text/plain like execute_result (headless analog, no image bytes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Unify on a single send background primitive + a uniform poll object (status / results-FIFO / queued / error); move queueing into Phase 1; make continuation tool-defined via a worker+processor split; universal addressability; marker-file global cap; graceful-dismiss cascade in PR 1. Record M0 verified and display_data fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a generic background-worker runtime (rlm/_async_runtime.py: BackgroundWorker, Handle, Registry, ToolState) and wire rlm.send/get/list_agents into the kernel so the model can launch a named sub-agent in one cell and poll/continue it in another. - RLMEngine split into setup()/advance()/aclose(); run() == setup + one advance + close (behavior-preserving). Named agents keep the engine + kernel alive and advance() per send -> multi-turn conversations. - Per-name worker drains an inbox sequentially (queueing); poll() returns a ToolState (status / results FIFO / live-editable queued / live error) and never consumes. - Named child sessions at <parent>/sub-<name>/messages.jsonl (transcript via handle.session_dir). - Graceful-dismiss cascade: aclose() drains the kernel's sub-agents (finalizing sessions, recursing into grandchildren) before aggregating child metrics and shutting the kernel down. - Per-send token budget clamped to the RLM_SUB_MAX_TOKENS ceiling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Parks the display_data fix (PR not opened) so the persistent-tools branch stays focused. Reverts the cherry-picked 7fd83d9; the standalone fix/ipython-capture-display-data branch is preserved on the remote. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The recursion section now documents rlm.send / handle.poll / rlm.get / rlm.list_agents / handle.dismiss alongside the existing await rlm(...) path, so the model knows to launch a named sub-agent in one tool call and poll or continue it in another. Only shown when recursion is allowed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Per the uniformity principle, every tool is send -> handle -> poll(), so per-name lookup was rlm-specific surface. The model keeps handles in its own variables (the IPython namespace persists across tool calls) and re-sends a name to continue. The registry stays internal so send(name) can still find an existing agent. Updates the system prompt and design doc to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rlm/_agent_limit.py reserves one marker file per live sub-agent in a per-rollout dir shared by the whole process tree (RLM_LIVE_AGENTS_DIR, derived from the root session and propagated to each kernel via _inject_startup). flock serializes sweep-count-create; a PID-liveness sweep reclaims slots leaked by hard-killed processes. rlm.send acquires a slot for a new agent (continuation reuses it) and raises AgentLimitReached at capacity; the slot is released on teardown/dismiss. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

await rlm(...) / gather(...) one-offs now count against RLM_MAX_LIVE_AGENTS: a sub-agent (depth > 0) reserves a slot via acquire_slot_blocking, which waits for a free slot rather than raising (the model awaits the one-off anyway), and releases it when done. RLM_AGENT_WAIT_TIMEOUT (default 300s; 0 = forever) proceeds over-cap to avoid deadlock when slot-holders are themselves waiting for slots. The root rollout (depth 0) is not capped; send still raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

attach_background gives each uploaded skill a stateless .send(*a, **kw) -> handle (FnProcessor over the skill's run), wired in _inject_startup so skills get the same background/poll lifecycle as sub-agents: auto-named, no persistence, and not counted against the live-agent cap (skills are cheap coroutines, not kernels). The system prompt advertises it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Named agents were keyed in the registry by the raw name while their session dir used the sanitized name, so two names that sanitize alike (foo/bar and foo-bar) spawned separate workers and engines writing to one sub-* directory and transcript. Sanitize at the send() boundary so the registry key and the dir suffix derive from the same string; colliding names now continue a single agent. Also drop a stale rlm.get(name) reference from the send() docstring (that API was removed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Agents are dropped by simply not sending to them again; there is no explicit per-agent teardown. Removing dismiss also closes a latent bug: a name could be removed from the registry mid-rollout and then re-created, reusing the same sub-<name> dir and concatenating two unrelated rollouts into one transcript (and clobbering meta.json). With no removal path, each sub-<name> dir is created once and reused for every turn of that agent. - _async_runtime: drop Handle.dismiss / Handle._registry / Registry.remove and the _detach/_DETACHED machinery; reword the error-state messages. - api: AgentLimitReached message points at reusing an existing agent. - prompt: drop the dismiss line from the recursion section. - tests: cleanup via Registry.close_all; cap test exercises teardown. - docs: rename "graceful-dismiss cascade" -> "graceful teardown cascade" and add an agent-lifecycle note on the cap consequence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

RLM_MAX_LIVE_AGENTS now bounds *resident* agents — a slot held from creation to teardown — and a new RLM_MAX_RUNNING_AGENTS bounds agents *executing a turn*, a slot held only around advance(). An idle agent that could still take another query holds no running slot, so the running cap is a true parallelism knob. An errored turn is reaped eagerly (kernel shut, total slot freed) while the worker stays pollable for its error. - _agent_limit: generalize to named pools (TOTAL/RUNNING), each a subdir under RLM_LIVE_AGENTS_DIR with its own flock + PID-liveness sweep. - api: send reserves a TOTAL slot (raises AgentLimitReached at cap); _RlmProcessor takes a RUNNING slot per advance and reaps on error; one-off run() holds both for its lifetime, blocking on each. - engine: derive RLM_LIVE_AGENTS_DIR when either cap is set. - tests: pool-aware cap tests, pool independence, errored-agent reap. - docs: two-cap section in section 3.7; lifecycle note updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_RlmProcessor.process built and set up the engine outside its try, so a failure there (bad engine kwarg, a kernel that won't start, a missing system-prompt file) skipped _reap: the resident RLM_MAX_LIVE_AGENTS slot taken in send() leaked until rollout teardown even though the worker was already in its error state. Move construction + setup inside the try so any failure path reaps the agent and frees its slot eagerly. Also harden RLMEngine.aclose: if setup() started the kernel but failed before completing, the old `if not self._setup_done: return` guard left the kernel subprocess orphaned. aclose now shuts the kernel whenever one was started, skipping finalize when there is no completed run to finalize. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

An omitted name now gets a uuid hex instead of the deterministic adjective-noun-animal scheme. uuids are unique without a dedup loop, so this deletes the three word lists, the sha256 _auto_name generator, Registry's name_seed/_auto_counter and its _auto_name method, and the name_seed threading through attach_background and both call sites. Trade-off: auto-names are no longer reproducible across identical rollouts, but the model holds the handle in a variable and explicit names still work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add section 3.10: a kernel restart (timeout recovery) wipes the parent kernel's in-memory state including the registry and sub-agent engines. Plan: messages.jsonl becomes an append-only list of views (supersedes the lossy event log), meta.json is written per turn as the resume header, setup() rehydrates from disk when the in-memory engine is gone, and any agent whose kernel restarted gets a kernel-reset warning. Drops the disambiguation idea and the from-disk-rehydration non-goal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_interrupt_and_recover now reports whether it had to restart the kernel (vs. recovering via interrupt alone). When it restarted, the tool output appends a kernel-reset notice so the model knows its variables, imports, and in-memory state are gone and must be rebuilt — previously the reset was silent and the model would reference vanished state. First slice of the restart-resilience work (docs section 3.10); the resume-side warning rides on the upcoming resume-from-disk path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rewrite the session transcript from a lossy event log into the replayable structure from docs section 3.10: each line is a typed object (msg / branch_reset / spawn / done). Message lines carry a view index + turn and the full OpenAI shape (tool_call_ids included). A compaction closes the current view (recording the checkpoint prompt and the assistant summary the model produced) and opens the next, so each view is exactly the context the model saw in that branch. Session.load_latest_view() reconstructs the newest view verbatim. Also write a resume header to meta.json every turn (usage, turn_offset, view, branch_start_turn, RLMMetrics.snapshot()) so a hard restart, which never reaches finalize, can restore state later. Adds RLMMetrics.snapshot()/restore(). Nothing outside rlm reads messages.jsonl, so this is a local schema change. The resume wiring (setup() loading the view) lands next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

setup() now checks for an existing on-disk view: if the session dir has a transcript but the in-memory engine is gone (the registry was wiped by a parent-kernel restart), it loads the latest view as _messages and restores the resume header (usage / turn_offset / view / branch_start / RLMMetrics) from meta.json instead of seeding a fresh conversation, then injects a kernel-reset warning — the conversation is restored but the REPL is brand new. Re-sending a name after a restart is now a true continuation rather than a silent fresh start onto a mixed transcript, which is what lets us keep one-dir-per-agent without disambiguation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Post-review cleanup of the persistent-tools branch, no behavior change: - drop dead code: BackgroundWorker.error (no callers), Registry.list() (test-only; the design rules out a list surface), and the unreachable ERROR guard in submit() (Registry.send already guards it). - setup() builds the system prompt only when seeding fresh, so a resume-from-disk no longer constructs a prompt it discards. - inline single-use helpers: _read_resume_header into _resume, _drain_child_agents into aclose; replace _force_acquire with a shared _make_marker() used by acquire_slot and acquire_slot_blocking. - drop the always-constant `command` arg from Session.log_spawn. - trim design-doc narration from docstrings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A `<skill>.send(...)` auto-named a worker and parked it in a per-skill registry until kernel teardown, so repeated background skill calls accumulated workers + parked drain tasks for the whole rollout. General tools now run on their own ephemeral worker: it processes the one call to completion, then the task ends. The worker is owned solely by its Handle and is never registered, so it (and its result) are GC'd once the model drops the handle — no accumulation, no sweep needed. A tool that wants state across calls keeps its own cache. The registry + naming is now the rlm-only continuation mechanism; named rlm agents stay resident (no auto-drain), unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The first ephemeral-worker cut kept a separate _run_once alongside the resident _drain, duplicating the process/try-except/append block. The two lifecycles differ only in what happens when the inbox drains — a resident worker parks, an ephemeral one ends — so collapse them into one _drain with a one-line fork and a self._ephemeral flag set at construction. Net -15 lines, no behavior change. Also fixes BackgroundWorker's docstring, which still referenced the removed `error` attribute (now poll().error). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor · 2026-06-13T07:18:35Z

        self._km.interrupt_kernel()
        if not self._wait_for_idle(timeout=2):
            self.restart_kernel()
+            return True


Restart skips agent drain

Medium Severity

IPython kernel restart (including timeout recovery) does not run _drain_agents before wiping in-kernel state. Persistent background sub-agents lose their registry and workers without aclose, so nested REPL subprocesses and total-cap markers may leak until process exit.

^{Reviewed by Cursor Bugbot for commit 98039ef. Configure here.}

Rewrite the design doc as a clean, self-contained description of the background/persistent sub-agent + tools system, against main — no build-order, milestone, or iteration framing, no second person. Fix references that had drifted from the code: the engine loop lives in advance() (not _run_loop); the two kernel-reset paths are distinct (timeout restart -> note in the tool result; parent-kernel restart -> resume + an injected warning turn); the spawn transcript line is `spawn`. Rename the file to match its title and update the docstring link in _async_runtime.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

run() skipped the live-agent caps at depth 0 but send() acquired a total slot unconditionally, so a depth-0 caller of the public rlm.send could hit AgentLimitReached even though the root rollout is meant to be uncapped (it can't happen from a kernel, where RLM_DEPTH >= 1, but rlm.send is public). Centralize the rule in _is_subagent() (depth > 0) and use it in run, send, the per-turn running-slot acquire, and _child_session. No change for real (depth >= 1) usage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace scattered ``os.environ.get(...)`` reads across engine, api, _agent_limit, client, session, and the builtin tools with a single frozen ``Config`` built once by ``config.load_config()`` (cached) and read via ``get_config()``. ``RLMEngine`` holds an overridable ``Config``; leaf and cross-process helpers use ``get_config()``. Environment variables stay the transport across process boundaries: each kernel subprocess loads its own Config from inherited env, and ``_inject_startup`` still writes the per-kernel overrides before rlm is imported there. Provider credentials (``client.resolve_provider``) and the active builtin-tool set (``tools.registry``) intentionally keep their own env handling. Also make the agent-cap parsing tolerant (review finding B6): an empty or malformed ``RLM_MAX_LIVE_AGENTS`` / ``RLM_MAX_RUNNING_AGENTS`` / ``RLM_AGENT_WAIT_TIMEOUT`` now disables the cap / uses the default rather than raising ``ValueError`` on the acquire hot path. Budgets stay strict. Behavior-preserving otherwise. conftest clears the config cache per test; adds test_cap_disabled_on_malformed_limit. 113 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Consolidated architecture / maintainability review and adversarially verified bug findings (B1-B14) for the background / persistent sub-agent feature, plus the resumable-engine and session-persistence rewrite. Includes a verified-clean list and suggested fix ordering. B6 (tolerant agent-cap parsing) is marked fixed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move the skill / rlm wrapping logic (_CallableModule, _wrap_callable, _log_programmatic_call) out of the ~70-line Python program that _inject_startup exec'd as an f-string and into rlm/_kernel_bootstrap.py, so it is importable, unit-testable, and type-checked instead of an opaque string literal with path-interpolation hazards. The injected cell now only does the plumbing that must run in the kernel's interactive namespace: set the per-kernel env vars (the cross-process config transport), apply nest_asyncio, and merge build_namespace(...) into globals(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Collapse Registry.send's processor_factory + session_dir_factory + holder-dict dance into a single worker_factory(name). The factory is the one place a resident (total) slot is reserved and — on any creation failure — released, so slot ownership is no longer split between send() and the processor with a fallback except (review macro item #1). The registry calls the factory only for a genuinely new agent, so continuations reuse the live worker. Folds in several lifecycle-edge fixes that fall out of this: - B2: the one-off run() path acquires its total/running slots inside the try/finally, so a cancellation between the two blocking acquires can no longer leak the already-reserved total slot. - B8: BackgroundWorker.close() settles on a terminal status (FINISHED, preserving ERROR) instead of leaving a stale RUNNING for a held handle. - B9: submit() refuses a closed worker instead of silently dropping the item and faking RUNNING. - B12: _drain_agents prefers asyncio.get_running_loop(). Tests updated for the new worker_factory signature. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

load_latest_view() now drops an unparseable final line (a hard crash can leave the last messages.jsonl line half-written) and resumes from the rest, mirroring ProgrammaticToolCallStats.from_log. A malformed line anywhere other than the end is still treated as real corruption and re-raised. Without this, a torn line made setup() raise, the worker errored, and the agent became permanently un-resumable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aclose() ran the drain cell and kernel shutdown synchronously, stalling this agent's cooperative loop (and its siblings) while a child kernel worked; failures were swallowed by a bare except. Now both the drain cell and shutdown run via asyncio.to_thread, the drain budget is a named constant, and a drain failure is logged rather than silently dropped. Not fully closed: a drain-cell timeout can still trigger a kernel restart that orphans descendants' kernels (review B5) — left for a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a Status column to the summary table and a re-check section: B2, B4, B6, B8, B9, B12 fixed; B5 mostly fixed; macro items #1, #4, #5, #8 plus the config refactor implemented; #7 (advance decomposition) and the HIGH bugs B1/B3 deferred together. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…r (B1, B3) Decompose the ~190-line advance() loop (review macro #7) by extracting _request_completion, _note_turn_usage, _parse_tool_calls, and _execute_tool_call, leaving the loop's control flow readable. Two HIGH bugs that live in this loop are fixed in the same pass: - B1: a turn can stop right after the assistant's tool_calls are recorded but before their results — a token-budget stop on a tool-call turn, or a crash mid-exec that resume reloads. Appending the next user turn then produced an invalid OpenAI sequence (a hard 400). A new _answer_dangling_tool_calls() synthesizes a tool result for each unanswered call at the two points just before a user turn is appended (advance() start and _resume(), before its warning). - B3 (also fixes B10): the resume header was only written at the top of the loop, so it lagged a turn and a clean-stop resume under-counted usage / metrics and reused turn indices. Now also written at the end of advance() with the final state (turn_offset == turns). Adds tests for both: a resume whose saved view ends with a dangling tool call, and the end-of-advance header reflecting the completed turn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- B7: a non-positive max_tokens is treated as "no explicit budget" (falls back to the RLM_SUB_MAX_TOKENS ceiling / no limit) instead of 0 disabling the budget by truthiness or a negative stopping the sub-agent after one turn. Normalized in both rlm.send (so 0 -> ceiling, per the doc) and RLMEngine (so the direct / run path can't brick). - B11: resume derives the view index from the loaded transcript (load_latest_view now returns (view, msgs)) rather than from meta.json, so a stale meta after a mid-compaction crash can't re-open a branch_reset-closed branch. Residual: a cosmetic orphaned branch_reset log line is still possible in a sub-millisecond crash window. - B13: re-sending a name whose worker errored evicts the dead worker and rebuilds fresh (restart / resume), keeping the name reusable and the registry from accumulating dead workers. - B14: skills' .send now mirrors run's signature + docstring (help() and inspect.signature work like rlm.send); the send docstring no longer calls the uuid auto-name "deterministic"; the design-doc claim is corrected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Only the B5 drain-timeout kernel-restart orphan edge remains from the review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor · 2026-06-13T10:07:23Z

+            if markers_dir is None or fcntl is None:
+                return None
+            markers_dir.mkdir(parents=True, exist_ok=True)
+            return _make_marker(markers_dir)


Over-cap marker skips flock

Low Severity

After RLM_AGENT_WAIT_TIMEOUT, acquire_slot_blocking creates a marker with _make_marker outside the pool’s flock critical section used by acquire_slot, so concurrent acquires can race and the live-count check can be briefly inconsistent.

^{Reviewed by Cursor Bugbot for commit f2db2c3. Configure here.}

Rename _async_runtime.py -> async_runtime.py, _agent_limit.py -> agent_limit.py, and _kernel_bootstrap.py -> kernel_bootstrap.py (via git mv, so history follows), updating all imports and doc references. These modules are internal to the rlm package; the private-file convention added noise without guarding any external surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bals The rlm package has no external consumers of its internal helpers, so the private-name convention on module-level functions and globals was just noise. Drop the underscore package-wide, keeping it only on class methods and instance attributes, where it still marks a class's internals. Three names couldn't simply lose the prefix: - config._int -> env_int (shadowed the builtin int) - agent_limit._limit -> pool_limit and _markers_dir -> pool_markers_dir (collided with local variables in acquire_slot) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A named background agent spawned via send() now gets its own sub-<name> session under the current session dir regardless of depth, instead of only when depth > 0. Previously a depth-0 send (e.g. a direct rlm.send(...) from the root process) skipped child_session, so the background engine fell back to the root RLM_SESSION_DIR and appended to the same messages.jsonl while handle.session_dir stayed None. The depth gate moves from child_session to run()'s call site: run keeps the root rollout (depth 0) on its own session and only nests for a sub-agent one-off, while send always nests (a named background agent is a distinct child even at the root). Latent on the model's path (send is only reachable inside a kernel, which always runs at depth >= 1), but flagged by external review; now consistent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The REPL turns a cell timeout / error into captured output rather than an exception, so aclose's drain (run via IPythonREPL.execute) returned normally on a 120s timeout and finalize proceeded as if the cascade succeeded — silently leaving descendant kernels/sessions open while the parent's metrics read as a clean shutdown. The drain cell now prints a DRAIN_OK sentinel only after close_all_registries returns; teardown checks the cell output for it. A missing sentinel (timeout, cell error, or a raised exception) is logged and recorded as teardown_drain_complete=false in meta, rather than mistaken for success. Still best-effort: orphaned descendant kernels left by a restarted child kernel aren't retroactively reaped (OS-level cleanup, future work). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_resume injected the IPython "kernel was restarted ... recreate your variables" warning whenever a non-empty transcript was rehydrated, even for a tools-only / chat-only agent that never had an IPython REPL — a vacuous, misleading turn. Gate the warning on the engine having a REPL. For a REPL agent the warning stays and is accurate: every resume path follows a genuine kernel teardown (reap/aclose, or a parent-kernel restart) and setup() starts a fresh, empty REPL, so "recreate your variables" holds — contrary to the review's "kernel unchanged" premise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The B4 torn-line recovery only dropped a corrupt line when it was the last *physical* line in messages.jsonl, so a crash that left a torn final record followed by a trailing blank line (or stray newline) still raised JSONDecodeError out of load_latest_view -> setup(), blocking resume even though every earlier record was valid. load_latest_view now strips trailing blank lines before the tail check, so the torn record is recognized as the tail and dropped; mid-file corruption (a corrupt line with real content after it) still raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Compaction writes branch_reset(V) and then seeds the new view as two lines: the system message, then user(framing+summary). A hard crash between those two writes left the new view with only its system message, and load_latest_view picked it (highest view among msg lines) — so resume dropped the entire pre-compaction conversation and continued from a lone system prompt. load_latest_view now falls back to the previous, complete branch when the latest view is a single system message (views above 0 exist only because compaction created them, and it always seeds [system, user(summary)], so a lone system there is an interrupted seed). The next turn re-compacts. The adjacent window the review flagged (branch_reset written, new view not started yet) was already benign: resume loads the complete branch V and re-compacts. Reconstructing the exact compacted state would need session<->engine coupling for negligible benefit given both windows are sub-millisecond synchronous-write gaps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rewrite background-subagents-and-tools.md to describe, in present tense, what this branch adds on top of main: the send/poll background API, workers/registry, skill backgrounding, the resumable engine, the view-structured transcript with resume/restart behavior, the two caps, teardown, the token budget, configuration, and the supporting refactors (config centralization, kernel-bootstrap module, advance() decomposition, naming). Drops the design-process framing, motivation pitch, and speculative future-work; reflects current behavior throughout. Remove persistent-tools-review.md: a transient record of the review and its fixes, which now live in the code and commit history. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Each paragraph and list item is now a single line, so copying from the raw markdown no longer injects mid-paragraph line breaks. Rendered output is unchanged; headers, nested lists, the code fence, and blank-line separation are preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 6da8466. Configure here.}

RLMEngine.run() called setup() outside its try/finally, so a setup() failure after the IPython kernel had started — e.g. a corrupt mid-file record raised by load_latest_view on resume — skipped the finally and leaked the kernel/REPL. aclose() already handles a partial setup (it shuts the REPL when _setup_done is False but _repl exists); it just wasn't being called on this path. Move setup() inside the try so the finally always runs aclose(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aclose() only closed the session's messages.jsonl on the finalize path (via session.finalize). Its early return (setup never finished and no REPL — e.g. RLM_TOOLS="") and the setup-failed-after-kernel path both left the handle open, leaking the fd and leaving the session directory unfinished. Now it early-returns only when nothing was opened (no session and no REPL) and closes the session in the finally on every other path. Session.close() is idempotent, so the finalize path (which already closes it) is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

snimu and others added 8 commits June 11, 2026 15:30

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/rlm/session.py

Comment thread src/rlm/api.py Outdated

Comment thread src/rlm/_async_runtime.py Outdated

snimu and others added 5 commits June 11, 2026 18:22

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread src/rlm/api.py

snimu and others added 2 commits June 11, 2026 20:55

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/rlm/session.py

snimu and others added 4 commits June 12, 2026 14:16

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/rlm/async_runtime.py

snimu and others added 3 commits June 12, 2026 17:02

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/rlm/api.py Outdated

Comment thread src/rlm/api.py Outdated

snimu changed the title ~~Persistent sub-agents and tools~~ [DO NOT MERGE] Persistent sub-agents and tools Jun 13, 2026

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/rlm/engine.py Outdated

snimu and others added 7 commits June 13, 2026 10:59

cursor Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/rlm/engine.py

Comment thread src/rlm/api.py

snimu and others added 4 commits June 13, 2026 11:47

docs: mark B1, B3, B10 fixed and #7 done

62f6334

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs: mark B7, B11, B13, B14 fixed

f2db2c3

Only the B5 drain-timeout kernel-restart orphan edge remains from the review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Jun 13, 2026

View reviewed changes

snimu and others added 3 commits June 13, 2026 12:10

cursor Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread src/rlm/engine.py Outdated

cursor Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread src/rlm/session.py

snimu and others added 2 commits June 14, 2026 16:22

cursor Bot reviewed Jun 14, 2026

View reviewed changes

Comment thread src/rlm/session.py Outdated

snimu and others added 3 commits June 14, 2026 17:01

cursor Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread src/rlm/engine.py

Comment thread src/rlm/engine.py

snimu and others added 2 commits June 14, 2026 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] Persistent sub-agents and tools#97

[DO NOT MERGE] Persistent sub-agents and tools#97
snimu wants to merge 47 commits into
mainfrom
sebastian/persistent-tools-2026-06-11

snimu commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

snimu commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Restart skips agent drain

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 13, 2026

Choose a reason for hiding this comment

Over-cap marker skips flock

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

snimu commented Jun 11, 2026 •

edited by cursor Bot

Loading