Skip to content

[DO NOT MERGE] Persistent sub-agents and tools#97

Open
snimu wants to merge 47 commits into
mainfrom
sebastian/persistent-tools-2026-06-11
Open

[DO NOT MERGE] Persistent sub-agents and tools#97
snimu wants to merge 47 commits into
mainfrom
sebastian/persistent-tools-2026-06-11

Conversation

@snimu

@snimu snimu commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Create persistent sub-agents and tools. For more info see the design doc.


Note

High Risk
Large architectural change to the agent loop, session persistence, kernel lifecycle, and concurrency limits; mistakes could leak kernels, corrupt transcripts, or deadlock rollouts under caps.

Overview
Adds a background sub-agent and programmatic-tool layer so IPython cells can send work and poll handles across later cells, including named multi-turn rlm specialists.

RLMEngine is split into setup() / advance() / aclose() so one conversation can advance repeatedly and rehydrate from disk after kernel loss. messages.jsonl becomes a typed, view-based append-only transcript with load_latest_view() recovery (torn tails, interrupted compaction). Resume paths answer dangling tool calls, persist metrics via RLMMetrics.snapshot/restore, and optionally warn when the REPL was reset.

async_runtime provides workers, handles, ToolState, and a per-kernel registry; api.send wires a stateful _RlmProcessor with nested sub-<name>/ sessions. Skills get ephemeral .send via kernel_bootstrap. agent_limit enforces total and running caps with marker files + flock.

config.py centralizes RLM_* settings; teardown cascade-drains child agents with a sentinel check. System prompts document rlm.send / handle.poll() and skill backgrounding. Broad test coverage for runtime, caps, persistence, resume, and sessions.

Reviewed by Cursor Bugbot for commit d99c70b. Bugbot is set up for automated code reviews on this repo. Configure here.

snimu and others added 8 commits June 11, 2026 15:30
Phase 1 (build now): a unified send/poll background primitive for all programmatic callables, named persistent multi-turn rlm sub-agents, and token budgets. Phase 2 (deferred): interruptions overview.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
IPythonREPL._execute_locked captured stream / execute_result / error messages but dropped display_data and update_display_data, so output from display() and rich reprs a normal IPython user sees was invisible to the model. Capture their text/plain like execute_result (headless analog, no image bytes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Unify on a single send background primitive + a uniform poll object (status / results-FIFO / queued / error); move queueing into Phase 1; make continuation tool-defined via a worker+processor split; universal addressability; marker-file global cap; graceful-dismiss cascade in PR 1. Record M0 verified and display_data fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a generic background-worker runtime (rlm/_async_runtime.py: BackgroundWorker, Handle, Registry, ToolState) and wire rlm.send/get/list_agents into the kernel so the model can launch a named sub-agent in one cell and poll/continue it in another.

- RLMEngine split into setup()/advance()/aclose(); run() == setup + one advance + close (behavior-preserving). Named agents keep the engine + kernel alive and advance() per send -> multi-turn conversations.
- Per-name worker drains an inbox sequentially (queueing); poll() returns a ToolState (status / results FIFO / live-editable queued / live error) and never consumes.
- Named child sessions at <parent>/sub-<name>/messages.jsonl (transcript via handle.session_dir).
- Graceful-dismiss cascade: aclose() drains the kernel's sub-agents (finalizing sessions, recursing into grandchildren) before aggregating child metrics and shutting the kernel down.
- Per-send token budget clamped to the RLM_SUB_MAX_TOKENS ceiling.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Parks the display_data fix (PR not opened) so the persistent-tools branch stays focused. Reverts the cherry-picked 7fd83d9; the standalone fix/ipython-capture-display-data branch is preserved on the remote.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The recursion section now documents rlm.send / handle.poll / rlm.get / rlm.list_agents / handle.dismiss alongside the existing await rlm(...) path, so the model knows to launch a named sub-agent in one tool call and poll or continue it in another. Only shown when recursion is allowed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the uniformity principle, every tool is send -> handle -> poll(), so per-name lookup was rlm-specific surface. The model keeps handles in its own variables (the IPython namespace persists across tool calls) and re-sends a name to continue. The registry stays internal so send(name) can still find an existing agent. Updates the system prompt and design doc to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rlm/_agent_limit.py reserves one marker file per live sub-agent in a per-rollout dir shared by the whole process tree (RLM_LIVE_AGENTS_DIR, derived from the root session and propagated to each kernel via _inject_startup). flock serializes sweep-count-create; a PID-liveness sweep reclaims slots leaked by hard-killed processes. rlm.send acquires a slot for a new agent (continuation reuses it) and raises AgentLimitReached at capacity; the slot is released on teardown/dismiss.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/session.py
Comment thread src/rlm/api.py Outdated
Comment thread src/rlm/_async_runtime.py Outdated
snimu and others added 5 commits June 11, 2026 18:22
await rlm(...) / gather(...) one-offs now count against RLM_MAX_LIVE_AGENTS: a sub-agent (depth > 0) reserves a slot via acquire_slot_blocking, which waits for a free slot rather than raising (the model awaits the one-off anyway), and releases it when done. RLM_AGENT_WAIT_TIMEOUT (default 300s; 0 = forever) proceeds over-cap to avoid deadlock when slot-holders are themselves waiting for slots. The root rollout (depth 0) is not capped; send still raises.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
attach_background gives each uploaded skill a stateless .send(*a, **kw) -> handle (FnProcessor over the skill's run), wired in _inject_startup so skills get the same background/poll lifecycle as sub-agents: auto-named, no persistence, and not counted against the live-agent cap (skills are cheap coroutines, not kernels). The system prompt advertises it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Named agents were keyed in the registry by the raw name while their
session dir used the sanitized name, so two names that sanitize alike
(foo/bar and foo-bar) spawned separate workers and engines writing to
one sub-* directory and transcript. Sanitize at the send() boundary so
the registry key and the dir suffix derive from the same string;
colliding names now continue a single agent.

Also drop a stale rlm.get(name) reference from the send() docstring
(that API was removed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Agents are dropped by simply not sending to them again; there is no
explicit per-agent teardown. Removing dismiss also closes a latent bug:
a name could be removed from the registry mid-rollout and then
re-created, reusing the same sub-<name> dir and concatenating two
unrelated rollouts into one transcript (and clobbering meta.json). With
no removal path, each sub-<name> dir is created once and reused for
every turn of that agent.

- _async_runtime: drop Handle.dismiss / Handle._registry / Registry.remove
  and the _detach/_DETACHED machinery; reword the error-state messages.
- api: AgentLimitReached message points at reusing an existing agent.
- prompt: drop the dismiss line from the recursion section.
- tests: cleanup via Registry.close_all; cap test exercises teardown.
- docs: rename "graceful-dismiss cascade" -> "graceful teardown cascade"
  and add an agent-lifecycle note on the cap consequence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RLM_MAX_LIVE_AGENTS now bounds *resident* agents — a slot held from
creation to teardown — and a new RLM_MAX_RUNNING_AGENTS bounds agents
*executing a turn*, a slot held only around advance(). An idle agent
that could still take another query holds no running slot, so the
running cap is a true parallelism knob. An errored turn is reaped
eagerly (kernel shut, total slot freed) while the worker stays pollable
for its error.

- _agent_limit: generalize to named pools (TOTAL/RUNNING), each a subdir
  under RLM_LIVE_AGENTS_DIR with its own flock + PID-liveness sweep.
- api: send reserves a TOTAL slot (raises AgentLimitReached at cap);
  _RlmProcessor takes a RUNNING slot per advance and reaps on error;
  one-off run() holds both for its lifetime, blocking on each.
- engine: derive RLM_LIVE_AGENTS_DIR when either cap is set.
- tests: pool-aware cap tests, pool independence, errored-agent reap.
- docs: two-cap section in section 3.7; lifecycle note updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/api.py
snimu and others added 2 commits June 11, 2026 20:55
_RlmProcessor.process built and set up the engine outside its try, so a
failure there (bad engine kwarg, a kernel that won't start, a missing
system-prompt file) skipped _reap: the resident RLM_MAX_LIVE_AGENTS slot
taken in send() leaked until rollout teardown even though the worker was
already in its error state. Move construction + setup inside the try so
any failure path reaps the agent and frees its slot eagerly.

Also harden RLMEngine.aclose: if setup() started the kernel but failed
before completing, the old `if not self._setup_done: return` guard left
the kernel subprocess orphaned. aclose now shuts the kernel whenever one
was started, skipping finalize when there is no completed run to finalize.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
An omitted name now gets a uuid hex instead of the deterministic
adjective-noun-animal scheme. uuids are unique without a dedup loop, so
this deletes the three word lists, the sha256 _auto_name generator,
Registry's name_seed/_auto_counter and its _auto_name method, and the
name_seed threading through attach_background and both call sites.

Trade-off: auto-names are no longer reproducible across identical
rollouts, but the model holds the handle in a variable and explicit
names still work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/session.py
snimu and others added 4 commits June 12, 2026 14:16
Add section 3.10: a kernel restart (timeout recovery) wipes the parent
kernel's in-memory state including the registry and sub-agent engines.
Plan: messages.jsonl becomes an append-only list of views (supersedes
the lossy event log), meta.json is written per turn as the resume
header, setup() rehydrates from disk when the in-memory engine is gone,
and any agent whose kernel restarted gets a kernel-reset warning. Drops
the disambiguation idea and the from-disk-rehydration non-goal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_interrupt_and_recover now reports whether it had to restart the kernel
(vs. recovering via interrupt alone). When it restarted, the tool output
appends a kernel-reset notice so the model knows its variables, imports,
and in-memory state are gone and must be rebuilt — previously the reset
was silent and the model would reference vanished state.

First slice of the restart-resilience work (docs section 3.10); the
resume-side warning rides on the upcoming resume-from-disk path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the session transcript from a lossy event log into the
replayable structure from docs section 3.10: each line is a typed
object (msg / branch_reset / spawn / done). Message lines carry a view
index + turn and the full OpenAI shape (tool_call_ids included). A
compaction closes the current view (recording the checkpoint prompt and
the assistant summary the model produced) and opens the next, so each
view is exactly the context the model saw in that branch.
Session.load_latest_view() reconstructs the newest view verbatim.

Also write a resume header to meta.json every turn (usage, turn_offset,
view, branch_start_turn, RLMMetrics.snapshot()) so a hard restart, which
never reaches finalize, can restore state later. Adds
RLMMetrics.snapshot()/restore().

Nothing outside rlm reads messages.jsonl, so this is a local schema
change. The resume wiring (setup() loading the view) lands next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
setup() now checks for an existing on-disk view: if the session dir has
a transcript but the in-memory engine is gone (the registry was wiped by
a parent-kernel restart), it loads the latest view as _messages and
restores the resume header (usage / turn_offset / view / branch_start /
RLMMetrics) from meta.json instead of seeding a fresh conversation, then
injects a kernel-reset warning — the conversation is restored but the
REPL is brand new.

Re-sending a name after a restart is now a true continuation rather than
a silent fresh start onto a mixed transcript, which is what lets us keep
one-dir-per-agent without disambiguation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/async_runtime.py
snimu and others added 3 commits June 12, 2026 17:02
Post-review cleanup of the persistent-tools branch, no behavior change:

- drop dead code: BackgroundWorker.error (no callers), Registry.list()
  (test-only; the design rules out a list surface), and the unreachable
  ERROR guard in submit() (Registry.send already guards it).
- setup() builds the system prompt only when seeding fresh, so a
  resume-from-disk no longer constructs a prompt it discards.
- inline single-use helpers: _read_resume_header into _resume,
  _drain_child_agents into aclose; replace _force_acquire with a shared
  _make_marker() used by acquire_slot and acquire_slot_blocking.
- drop the always-constant `command` arg from Session.log_spawn.
- trim design-doc narration from docstrings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A `<skill>.send(...)` auto-named a worker and parked it in a per-skill
registry until kernel teardown, so repeated background skill calls
accumulated workers + parked drain tasks for the whole rollout.

General tools now run on their own ephemeral worker: it processes the one
call to completion, then the task ends. The worker is owned solely by its
Handle and is never registered, so it (and its result) are GC'd once the
model drops the handle — no accumulation, no sweep needed. A tool that
wants state across calls keeps its own cache.

The registry + naming is now the rlm-only continuation mechanism; named
rlm agents stay resident (no auto-drain), unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first ephemeral-worker cut kept a separate _run_once alongside the
resident _drain, duplicating the process/try-except/append block. The two
lifecycles differ only in what happens when the inbox drains — a resident
worker parks, an ephemeral one ends — so collapse them into one _drain with
a one-line fork and a self._ephemeral flag set at construction.

Net -15 lines, no behavior change. Also fixes BackgroundWorker's docstring,
which still referenced the removed `error` attribute (now poll().error).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/tools/ipython.py
self._km.interrupt_kernel()
if not self._wait_for_idle(timeout=2):
self.restart_kernel()
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restart skips agent drain

Medium Severity

IPython kernel restart (including timeout recovery) does not run _drain_agents before wiping in-kernel state. Persistent background sub-agents lose their registry and workers without aclose, so nested REPL subprocesses and total-cap markers may leak until process exit.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 98039ef. Configure here.

Rewrite the design doc as a clean, self-contained description of the
background/persistent sub-agent + tools system, against main — no
build-order, milestone, or iteration framing, no second person.

Fix references that had drifted from the code: the engine loop lives in
advance() (not _run_loop); the two kernel-reset paths are distinct (timeout
restart -> note in the tool result; parent-kernel restart -> resume + an
injected warning turn); the spawn transcript line is `spawn`. Rename the
file to match its title and update the docstring link in _async_runtime.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/api.py Outdated
Comment thread src/rlm/api.py Outdated
run() skipped the live-agent caps at depth 0 but send() acquired a total
slot unconditionally, so a depth-0 caller of the public rlm.send could hit
AgentLimitReached even though the root rollout is meant to be uncapped (it
can't happen from a kernel, where RLM_DEPTH >= 1, but rlm.send is public).

Centralize the rule in _is_subagent() (depth > 0) and use it in run, send,
the per-turn running-slot acquire, and _child_session. No change for real
(depth >= 1) usage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@snimu snimu changed the title Persistent sub-agents and tools [DO NOT MERGE] Persistent sub-agents and tools Jun 13, 2026
Comment thread src/rlm/engine.py Outdated
snimu and others added 7 commits June 13, 2026 10:59
Replace scattered ``os.environ.get(...)`` reads across engine, api,
_agent_limit, client, session, and the builtin tools with a single
frozen ``Config`` built once by ``config.load_config()`` (cached) and
read via ``get_config()``. ``RLMEngine`` holds an overridable ``Config``;
leaf and cross-process helpers use ``get_config()``.

Environment variables stay the transport across process boundaries: each
kernel subprocess loads its own Config from inherited env, and
``_inject_startup`` still writes the per-kernel overrides before rlm is
imported there. Provider credentials (``client.resolve_provider``) and
the active builtin-tool set (``tools.registry``) intentionally keep their
own env handling.

Also make the agent-cap parsing tolerant (review finding B6): an empty or
malformed ``RLM_MAX_LIVE_AGENTS`` / ``RLM_MAX_RUNNING_AGENTS`` /
``RLM_AGENT_WAIT_TIMEOUT`` now disables the cap / uses the default rather
than raising ``ValueError`` on the acquire hot path. Budgets stay strict.

Behavior-preserving otherwise. conftest clears the config cache per test;
adds test_cap_disabled_on_malformed_limit. 113 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Consolidated architecture / maintainability review and adversarially
verified bug findings (B1-B14) for the background / persistent sub-agent
feature, plus the resumable-engine and session-persistence rewrite.
Includes a verified-clean list and suggested fix ordering. B6 (tolerant
agent-cap parsing) is marked fixed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the skill / rlm wrapping logic (_CallableModule, _wrap_callable,
_log_programmatic_call) out of the ~70-line Python program that
_inject_startup exec'd as an f-string and into rlm/_kernel_bootstrap.py,
so it is importable, unit-testable, and type-checked instead of an
opaque string literal with path-interpolation hazards.

The injected cell now only does the plumbing that must run in the
kernel's interactive namespace: set the per-kernel env vars (the
cross-process config transport), apply nest_asyncio, and merge
build_namespace(...) into globals().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collapse Registry.send's processor_factory + session_dir_factory +
holder-dict dance into a single worker_factory(name). The factory is the
one place a resident (total) slot is reserved and — on any creation
failure — released, so slot ownership is no longer split between send()
and the processor with a fallback except (review macro item #1). The
registry calls the factory only for a genuinely new agent, so
continuations reuse the live worker.

Folds in several lifecycle-edge fixes that fall out of this:

- B2: the one-off run() path acquires its total/running slots inside the
  try/finally, so a cancellation between the two blocking acquires can no
  longer leak the already-reserved total slot.
- B8: BackgroundWorker.close() settles on a terminal status (FINISHED,
  preserving ERROR) instead of leaving a stale RUNNING for a held handle.
- B9: submit() refuses a closed worker instead of silently dropping the
  item and faking RUNNING.
- B12: _drain_agents prefers asyncio.get_running_loop().

Tests updated for the new worker_factory signature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
load_latest_view() now drops an unparseable final line (a hard crash can
leave the last messages.jsonl line half-written) and resumes from the
rest, mirroring ProgrammaticToolCallStats.from_log. A malformed line
anywhere other than the end is still treated as real corruption and
re-raised. Without this, a torn line made setup() raise, the worker
errored, and the agent became permanently un-resumable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aclose() ran the drain cell and kernel shutdown synchronously, stalling
this agent's cooperative loop (and its siblings) while a child kernel
worked; failures were swallowed by a bare except. Now both the drain
cell and shutdown run via asyncio.to_thread, the drain budget is a named
constant, and a drain failure is logged rather than silently dropped.

Not fully closed: a drain-cell timeout can still trigger a kernel restart
that orphans descendants' kernels (review B5) — left for a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a Status column to the summary table and a re-check section: B2, B4,
B6, B8, B9, B12 fixed; B5 mostly fixed; macro items #1, #4, #5, #8 plus
the config refactor implemented; #7 (advance decomposition) and the HIGH
bugs B1/B3 deferred together.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/engine.py
Comment thread src/rlm/api.py
snimu and others added 4 commits June 13, 2026 11:47
…r (B1, B3)

Decompose the ~190-line advance() loop (review macro #7) by extracting
_request_completion, _note_turn_usage, _parse_tool_calls, and
_execute_tool_call, leaving the loop's control flow readable. Two HIGH
bugs that live in this loop are fixed in the same pass:

- B1: a turn can stop right after the assistant's tool_calls are recorded
  but before their results — a token-budget stop on a tool-call turn, or a
  crash mid-exec that resume reloads. Appending the next user turn then
  produced an invalid OpenAI sequence (a hard 400). A new
  _answer_dangling_tool_calls() synthesizes a tool result for each
  unanswered call at the two points just before a user turn is appended
  (advance() start and _resume(), before its warning).

- B3 (also fixes B10): the resume header was only written at the top of
  the loop, so it lagged a turn and a clean-stop resume under-counted
  usage / metrics and reused turn indices. Now also written at the end of
  advance() with the final state (turn_offset == turns).

Adds tests for both: a resume whose saved view ends with a dangling tool
call, and the end-of-advance header reflecting the completed turn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- B7: a non-positive max_tokens is treated as "no explicit budget" (falls
  back to the RLM_SUB_MAX_TOKENS ceiling / no limit) instead of 0
  disabling the budget by truthiness or a negative stopping the sub-agent
  after one turn. Normalized in both rlm.send (so 0 -> ceiling, per the
  doc) and RLMEngine (so the direct / run path can't brick).
- B11: resume derives the view index from the loaded transcript
  (load_latest_view now returns (view, msgs)) rather than from meta.json,
  so a stale meta after a mid-compaction crash can't re-open a
  branch_reset-closed branch. Residual: a cosmetic orphaned branch_reset
  log line is still possible in a sub-millisecond crash window.
- B13: re-sending a name whose worker errored evicts the dead worker and
  rebuilds fresh (restart / resume), keeping the name reusable and the
  registry from accumulating dead workers.
- B14: skills' .send now mirrors run's signature + docstring (help() and
  inspect.signature work like rlm.send); the send docstring no longer
  calls the uuid auto-name "deterministic"; the design-doc claim is
  corrected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Only the B5 drain-timeout kernel-restart orphan edge remains from the review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/engine.py
Comment thread src/rlm/api.py
Comment thread src/rlm/_agent_limit.py Outdated
if markers_dir is None or fcntl is None:
return None
markers_dir.mkdir(parents=True, exist_ok=True)
return _make_marker(markers_dir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-cap marker skips flock

Low Severity

After RLM_AGENT_WAIT_TIMEOUT, acquire_slot_blocking creates a marker with _make_marker outside the pool’s flock critical section used by acquire_slot, so concurrent acquires can race and the live-count check can be briefly inconsistent.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f2db2c3. Configure here.

snimu and others added 3 commits June 13, 2026 12:10
Rename _async_runtime.py -> async_runtime.py, _agent_limit.py ->
agent_limit.py, and _kernel_bootstrap.py -> kernel_bootstrap.py (via
git mv, so history follows), updating all imports and doc references.
These modules are internal to the rlm package; the private-file
convention added noise without guarding any external surface.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bals

The rlm package has no external consumers of its internal helpers, so the
private-name convention on module-level functions and globals was just
noise. Drop the underscore package-wide, keeping it only on class methods
and instance attributes, where it still marks a class's internals.

Three names couldn't simply lose the prefix:
- config._int -> env_int (shadowed the builtin int)
- agent_limit._limit -> pool_limit and _markers_dir -> pool_markers_dir
  (collided with local variables in acquire_slot)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A named background agent spawned via send() now gets its own sub-<name>
session under the current session dir regardless of depth, instead of
only when depth > 0. Previously a depth-0 send (e.g. a direct
rlm.send(...) from the root process) skipped child_session, so the
background engine fell back to the root RLM_SESSION_DIR and appended to
the same messages.jsonl while handle.session_dir stayed None.

The depth gate moves from child_session to run()'s call site: run keeps
the root rollout (depth 0) on its own session and only nests for a
sub-agent one-off, while send always nests (a named background agent is a
distinct child even at the root).

Latent on the model's path (send is only reachable inside a kernel, which
always runs at depth >= 1), but flagged by external review; now consistent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/engine.py Outdated
The REPL turns a cell timeout / error into captured output rather than an
exception, so aclose's drain (run via IPythonREPL.execute) returned
normally on a 120s timeout and finalize proceeded as if the cascade
succeeded — silently leaving descendant kernels/sessions open while the
parent's metrics read as a clean shutdown.

The drain cell now prints a DRAIN_OK sentinel only after
close_all_registries returns; teardown checks the cell output for it. A
missing sentinel (timeout, cell error, or a raised exception) is logged
and recorded as teardown_drain_complete=false in meta, rather than
mistaken for success.

Still best-effort: orphaned descendant kernels left by a restarted child
kernel aren't retroactively reaped (OS-level cleanup, future work).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/session.py
snimu and others added 2 commits June 14, 2026 16:22
_resume injected the IPython "kernel was restarted ... recreate your
variables" warning whenever a non-empty transcript was rehydrated, even
for a tools-only / chat-only agent that never had an IPython REPL — a
vacuous, misleading turn.

Gate the warning on the engine having a REPL. For a REPL agent the warning
stays and is accurate: every resume path follows a genuine kernel teardown
(reap/aclose, or a parent-kernel restart) and setup() starts a fresh, empty
REPL, so "recreate your variables" holds — contrary to the review's
"kernel unchanged" premise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B4 torn-line recovery only dropped a corrupt line when it was the last
*physical* line in messages.jsonl, so a crash that left a torn final
record followed by a trailing blank line (or stray newline) still raised
JSONDecodeError out of load_latest_view -> setup(), blocking resume even
though every earlier record was valid.

load_latest_view now strips trailing blank lines before the tail check, so
the torn record is recognized as the tail and dropped; mid-file corruption
(a corrupt line with real content after it) still raises.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread src/rlm/session.py Outdated
snimu and others added 3 commits June 14, 2026 17:01
Compaction writes branch_reset(V) and then seeds the new view as two
lines: the system message, then user(framing+summary). A hard crash
between those two writes left the new view with only its system message,
and load_latest_view picked it (highest view among msg lines) — so resume
dropped the entire pre-compaction conversation and continued from a lone
system prompt.

load_latest_view now falls back to the previous, complete branch when the
latest view is a single system message (views above 0 exist only because
compaction created them, and it always seeds [system, user(summary)], so a
lone system there is an interrupted seed). The next turn re-compacts.

The adjacent window the review flagged (branch_reset written, new view not
started yet) was already benign: resume loads the complete branch V and
re-compacts. Reconstructing the exact compacted state would need
session<->engine coupling for negligible benefit given both windows are
sub-millisecond synchronous-write gaps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite background-subagents-and-tools.md to describe, in present tense,
what this branch adds on top of main: the send/poll background API,
workers/registry, skill backgrounding, the resumable engine, the
view-structured transcript with resume/restart behavior, the two caps,
teardown, the token budget, configuration, and the supporting refactors
(config centralization, kernel-bootstrap module, advance() decomposition,
naming). Drops the design-process framing, motivation pitch, and
speculative future-work; reflects current behavior throughout.

Remove persistent-tools-review.md: a transient record of the review and
its fixes, which now live in the code and commit history.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Each paragraph and list item is now a single line, so copying from the raw
markdown no longer injects mid-paragraph line breaks. Rendered output is
unchanged; headers, nested lists, the code fence, and blank-line
separation are preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 5 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6da8466. Configure here.

Comment thread src/rlm/engine.py
Comment thread src/rlm/engine.py
snimu and others added 2 commits June 14, 2026 17:49
RLMEngine.run() called setup() outside its try/finally, so a setup()
failure after the IPython kernel had started — e.g. a corrupt mid-file
record raised by load_latest_view on resume — skipped the finally and
leaked the kernel/REPL. aclose() already handles a partial setup (it shuts
the REPL when _setup_done is False but _repl exists); it just wasn't being
called on this path. Move setup() inside the try so the finally always
runs aclose().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
aclose() only closed the session's messages.jsonl on the finalize path
(via session.finalize). Its early return (setup never finished and no REPL
— e.g. RLM_TOOLS="") and the setup-failed-after-kernel path both left the
handle open, leaking the fd and leaving the session directory unfinished.

Now it early-returns only when nothing was opened (no session and no REPL)
and closes the session in the finally on every other path. Session.close()
is idempotent, so the finalize path (which already closes it) is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant