feat(ai): save partial results on timeout by janhesters · Pull Request #429 · paralleldrive/riteway

janhesters · 2026-03-22T13:55:24Z

Summary

When some runs complete before another times out, the completed runs' responses and judge results are now saved to disk instead of being lost
The timed-out run's partial agent output is also captured, followed by a [RITEWAY TIMEOUT] marker showing the timeout duration, so you can see exactly how far the agent got before being killed
The orchestrator writes partial TAP and responses files before re-throwing, so CI artifacts capture whatever completed
Fixes the scenario where 3 of 4 runs finish but a single timeout causes all data to be discarded

How it works

execute-agent.js: spawnProcess and collectProcessOutput share a single partialOutput buffer for stdout. On timeout, the buffer contents are attached as partialStdout on the TimeoutError, and the orphaned child process is killed.
ai-runner.js: executeRuns tracks completed run results as each run settles. If a run times out, its partial stdout is captured. runAITests builds partialResults from all available data: completed runs' full responses, plus the timed-out run's partial output appended with a [RITEWAY TIMEOUT] Agent was terminated after Xms. Output above is partial. marker. This is attached to the re-thrown error.
ai-command.js: The catch block checks for partialResults on caught errors and calls recordTestOutput before re-throwing. testFilename hoisted out of try so it's accessible in catch. The original error type is preserved so handleAIErrors still routes correctly.
README.md: Documents partial results on timeout behavior.

Test plan

All 229 tests pass (4 new)
Existing tests unaffected
New test: partial results + timeout marker when run 1 completes but run 2 times out
New test: partial stdout + timeout marker captured when all runs time out
New test: recordTestOutput called with partial results before re-throw
New test: recordTestOutput not called when error has no partial results
Self-review: eliminated duplicate stdout listener, verified error-causes routing

ericelliott · 2026-03-22T22:34:08Z

@cursoragent please /aidd-review

cursor · 2026-03-22T22:34:13Z

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

ericelliott · 2026-03-22T22:36:59Z

@cursoragent please /aidd-review

cursor · 2026-03-22T22:37:06Z

Now I'll compile my comprehensive code review following the project's review process.

🔬 Code Review: Partial Results on Timeout

🎯 Restate

This PR implements saving partial results when AI test runs time out, preserving completed run data and capturing partial output from timed-out runs with a clear [RITEWAY TIMEOUT] marker.

📊 Hotspot Analysis

⚠️ All three modified files are in the top hotspots:

source/ai-command.js - Score 15972 (HIGHEST) - churn: 3, complexity: 22
- This is the single highest-risk file in the codebase
- Changes here have the highest blast radius
- Extra scrutiny warranted
source/ai-runner.js - Score 7530 (3rd) - churn: 3, complexity: 10
- High churn indicates frequent changes
- Moderate complexity means changes carry moderate comprehension risk
source/execute-agent.js - Score 3204 (8th) - churn: 2, complexity: 9
- Lower risk but still in top 10 hotspots

Recommendation: These files are prime candidates for refactoring to reduce complexity. Consider extracting cohesive sub-modules, especially from ai-command.js (22 complexity score).

✅ Test Coverage

Status: Excellent

All 229 tests pass (4 new tests added)
New tests comprehensively cover the feature:
- ✓ Partial results when run 1 completes, run 2 times out
- ✓ Partial stdout capture when all runs time out
- ✓ recordTestOutput called with partialResults before re-throw
- ✓ recordTestOutput not called when no partialResults present
Tests use realistic mock scenarios with actual timeouts
Proper test isolation using temporary directories

💡 Code Quality Assessment

Strengths

1. Clean Separation of Concerns

const collectProcessOutput = (proc, partialOutput) =>
  new Promise((resolve, reject) => {
    let stderr = '';

    proc.stdout.on('data', (data) => { partialOutput.stdout += data.toString(); });
    proc.stderr.on('data', (data) => { stderr += data.toString(); });
    proc.on('close', (code) => { resolve({ stdout: partialOutput.stdout, stderr, code }); });
    proc.on('error', reject);
  });

The shared partialOutput buffer design is elegant - it enables both normal execution and timeout scenarios to access the same accumulating stdout buffer.

2. Proper Error Propagation

  let result;
  try {
    result = await withTimeout(
      promise,
      timeout,
      () => ({
        ...TimeoutError,
        message: `Agent process timed out after ${timeout}ms. Command: ${command} ${args.join(' ')}`,
        command,
        args: args.join(' '),
        timeout,
        partialStdout: partialOutput.stdout
      })
    );
  } catch (error) {
    try { proc.kill(); } catch { /* best-effort */ }
    throw error;
  }

Timeout errors include partialStdout in the error object, and the process is properly killed (with best-effort error handling).

3. Robust Aggregation Logic

  try {
    const allRunResults = await runsPromise;
    const aggregated = aggregateResults({ assertions, allRunResults, threshold, runs });
    const responses = allRunResults.map(({ response }) => response);
    return { ...aggregated, responses };
  } catch (error) {
    const timedOutPartialStdout = getTimedOutPartialStdout();
    const hasPartialData = completedRuns.length > 0 || timedOutPartialStdout !== undefined;

    if (hasPartialData) {
      const responses = completedRuns.map(({ response }) => response);

      if (timedOutPartialStdout !== undefined) {
        const timeoutMs = error.cause?.timeout ?? timeout;
        responses.push(
          timedOutPartialStdout +
          `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`
        );
      }

      const aggregated = completedRuns.length > 0
        ? aggregateResults({
          assertions,
          allRunResults: completedRuns,
          threshold,
          runs: completedRuns.length
        })
        : { passed: false, assertions: [] };

      throw createError({
        name: error.cause?.name ?? error.name,
        code: error.cause?.code ?? error.code,
        message: error.cause?.message ?? error.message,
        partialResults: { ...aggregated, responses },
        cause: error
      });
    }

    throw error;
  }

Handles multiple edge cases:

Some runs complete, others timeout
All runs timeout
No partial data available
Preserves original error name/code for proper routing

4. Proper Variable Hoisting

  const testFilename = basename(filePath);

  try {

The testFilename is correctly hoisted outside the try block so it's accessible in the catch block for writing partial results.

🪞 Areas for Improvement

1. Variable Naming Clarity

In executeRuns:

  let timedOutPartialStdout;

Issue: This captures only the last timed-out run's stdout, not all of them. Name doesn't convey the "last one wins" semantics.

Suggestion: Rename to lastTimedOutStdout for clarity.

2. Closure Documentation

    getTimedOutPartialStdout: () => timedOutPartialStdout

Issue: The closure pattern isn't immediately obvious from the function signature.

Suggestion: Add a JSDoc comment:

/**
 * Returns the most recently captured partial stdout from any timed-out run.
 * @returns {string|undefined} Partial output or undefined if no timeouts occurred
 */
getTimedOutPartialStdout: () => timedOutPartialStdout

3. Comment Verbosity

 * TODO (follow-up): console.log calls are mixed with orchestration here. A future
 * refactor could return a structured result and let the CLI layer own all output,
 * aligning with the saga pattern's IO-separation principle. Not critical for this PR.

Issue: While the TODO is helpful, it's overly detailed for what's essentially "TODO: separate IO from orchestration logic"

Suggestion: Condense to:

 * TODO: Separate console.log calls from orchestration (saga pattern IO-separation)

4. Magic String Format

The [RITEWAY TIMEOUT] marker is hardcoded in multiple places:

        responses.push(
          timedOutPartialStdout +
          `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`
        );

Suggestion: Extract to a constant at the module level:

const TIMEOUT_MARKER_TEMPLATE = (timeoutMs) => 
  `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`;

🔭 Architectural Observations

Positive Patterns

Error-Causes Pattern: Consistent use throughout maintains proper error routing
Functional Composition: executeRuns returns both a promise and getters - clean side-effect tracking
Best-Effort Cleanup: Process killing and partial output recording fail gracefully
Clear Separation: Process management → aggregation → output recording pipeline is well-defined

Future Considerations

The high complexity score (22) in ai-command.js suggests the orchestration function runAICommand is doing too much. Consider extracting:

Authentication verification
Result formatting and console output
Partial results handling

Into separate composed functions to reduce complexity and improve testability.

🔒 Security Review

OWASP Top 10 Check: ✅ No violations detected

No SQL injection vectors
No XSS vulnerabilities
No authentication/authorization bypasses
No sensitive data exposure
No CSRF concerns
Process spawning uses validated agent configs
Path traversal already prevented by existing validateFilePath (line 143)

Security Note: The proc.kill() on timeout (line 130 in execute-agent.js) is appropriate for orphaned child processes.

⚖️ Adherence to Standards

JavaScript Best Practices ✅

✓ Functional programming style maintained
✓ Proper use of const/destructuring
✓ Clear function names (verbs)
✓ Avoids mutation except for necessary accumulation buffers
✓ Clean error handling with error-causes library
✓ No unnecessary intermediate variables

TDD Best Practices ✅

✓ Tests answer the 5 questions (unit, behavior, actual, expected, debugging)
✓ Tests are isolated with proper cleanup
✓ Tests are thorough (edge cases covered)
✓ Tests are explicit (factories used, no shared fixtures)

Documentation ✅

README.md updated with clear explanation of partial results behavior
Docblocks present for public API (executeAgent, runAITests, runAICommand)
Inline comments are meaningful, not redundant

💬 Final Recommendations

Must Address Before Merge

None - code is production-ready.

Should Consider (Non-blocking)

Refactor ai-command.js to reduce complexity from 22 → ~15
- Extract authentication, formatting, and partial-results logic
- Would reduce blast radius of future changes
Extract timeout marker constant to eliminate magic strings
Rename timedOutPartialStdout → lastTimedOutStdout
Add JSDoc to closure in executeRuns return value

Overall Assessment

Status: ✅ APPROVE

This is a high-quality implementation that:

Solves a real problem (data loss on timeout)
Has comprehensive test coverage (4 new tests, all pass)
Follows project conventions consistently
Handles edge cases properly
Is well-documented

The changes touch hotspot files but are minimally invasive and well-isolated. The timeout handling logic is elegant and maintainable.

Great work! The only concerns are minor naming/documentation improvements and the ongoing need to reduce complexity in ai-command.js (though that's a broader refactoring concern, not specific to this PR).

ianwhitedeveloper

✅ APPROVE

Solid PR. Solves a real problem (data loss on timeout), has comprehensive test coverage (4 new tests, 229 total pass), and follows the project's error-causes and functional patterns cleanly.

Key Findings

The completedRuns tracking is more general than the PR title suggests — it captures completed runs for any error type (not just timeouts), which also guards against scenarios like the Cursor CLI ENOENT race condition where a stray process failure discards all results.

One race condition under concurrency > 1: completedRuns is read in the catch block immediately when Promise.all rejects, but other runs may still be in-flight. With concurrency: 4, the ENOENT fires during the concurrent judge burst before any run fully completes, so completedRuns is empty and no partial results are captured. Fix: await Promise.allSettled(runPromises) before reading completedRuns. (Tested locally — all 229 tests pass with this fix.)

Unbounded judge concurrency: Within each run, Promise.all spawns all judge calls simultaneously. With 4 runs × N assertions, this can produce 4×N concurrent agent processes — the direct trigger for the Cursor CLI's cli-config.json rename race. Throttling judge calls with limitConcurrency(judgeTasks, concurrency) reduces the blast radius.

Both enhancements are uncommitted on my local branch and available for a follow-up PR if desired.

Minor non-blocking suggestions

CI flakiness risk: Integration tests use 2s timeouts with real process spawning. Monitor CI; increase to 5s if flakes appear.
TODO comment verbosity: The TODO (follow-up) block in ai-command.js could be condensed to a single line: TODO: Separate console.log calls from orchestration (saga pattern IO-separation).
completedRuns ordering: Under concurrency > 1, the responses array order depends on which runs finish first, not run index. Acceptable for diagnostic output but worth noting.

OWASP Top 10 & security check

All clear. No injection vectors (spawn uses array args), no auth changes, no secrets in diff, no SSRF. Path traversal prevented by existing validateFilePath. proc.kill() on timeout is appropriate with best-effort error handling.

Hotspot analysis

All three modified source files are top-10 hotspots (ai-command.js #1 at 15972, ai-runner.js #3 at 7530, execute-agent.js #8 at 3204). Changes are well-scoped and don't meaningfully increase complexity. The long-term recommendation to extract orchestration sub-functions from ai-command.js (complexity 22) still stands as future work.

When some runs complete before another times out, attach the completed runs' data as partialResults on the error. The orchestrator writes these to disk before re-throwing, so CI artifacts capture whatever completed even on timeout failures.

janhesters force-pushed the feat/partial-results-on-timeout branch 3 times, most recently from aaee6ec to 6987af3 Compare March 22, 2026 14:08

janhesters requested review from ericelliott and ianwhitedeveloper March 22, 2026 14:08

janhesters force-pushed the feat/partial-results-on-timeout branch from 6987af3 to b9d710a Compare March 22, 2026 14:10

ianwhitedeveloper previously approved these changes Mar 23, 2026

View reviewed changes

janhesters dismissed ianwhitedeveloper’s stale review via 28754b8 March 23, 2026 16:43

janhesters force-pushed the feat/partial-results-on-timeout branch from b9d710a to 28754b8 Compare March 23, 2026 16:43

janhesters force-pushed the feat/partial-results-on-timeout branch from 28754b8 to 2c3eb63 Compare March 23, 2026 16:46

janhesters requested a review from ianwhitedeveloper March 23, 2026 16:46

ianwhitedeveloper approved these changes Mar 23, 2026

View reviewed changes

janhesters merged commit 3859725 into master Mar 23, 2026
2 checks passed

janhesters deleted the feat/partial-results-on-timeout branch March 23, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): save partial results on timeout#429

feat(ai): save partial results on timeout#429
janhesters merged 1 commit intomasterfrom
feat/partial-results-on-timeout

janhesters commented Mar 22, 2026 •

edited

Loading

Uh oh!

ericelliott commented Mar 22, 2026

Uh oh!

cursor bot commented Mar 22, 2026

Uh oh!

ericelliott commented Mar 22, 2026

Uh oh!

cursor bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

ianwhitedeveloper left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janhesters commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Test plan

Uh oh!

ericelliott commented Mar 22, 2026

Uh oh!

cursor bot commented Mar 22, 2026

Uh oh!

ericelliott commented Mar 22, 2026

Uh oh!

cursor bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔬 Code Review: Partial Results on Timeout

🎯 Restate

📊 Hotspot Analysis

✅ Test Coverage

💡 Code Quality Assessment

Strengths

🪞 Areas for Improvement

1. Variable Naming Clarity

2. Closure Documentation

3. Comment Verbosity

4. Magic String Format

🔭 Architectural Observations

Positive Patterns

Future Considerations

🔒 Security Review

⚖️ Adherence to Standards

JavaScript Best Practices ✅

TDD Best Practices ✅

Documentation ✅

💬 Final Recommendations

Must Address Before Merge

Should Consider (Non-blocking)

Overall Assessment

Uh oh!

ianwhitedeveloper left a comment

Choose a reason for hiding this comment

✅ APPROVE

Key Findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janhesters commented Mar 22, 2026 •

edited

Loading

cursor bot commented Mar 22, 2026 •

edited

Loading