Skip to content

feat(ai): save partial results on timeout#429

Merged
janhesters merged 1 commit intomasterfrom
feat/partial-results-on-timeout
Mar 23, 2026
Merged

feat(ai): save partial results on timeout#429
janhesters merged 1 commit intomasterfrom
feat/partial-results-on-timeout

Conversation

@janhesters
Copy link
Copy Markdown
Collaborator

@janhesters janhesters commented Mar 22, 2026

Summary

  • When some runs complete before another times out, the completed runs' responses and judge results are now saved to disk instead of being lost
  • The timed-out run's partial agent output is also captured, followed by a [RITEWAY TIMEOUT] marker showing the timeout duration, so you can see exactly how far the agent got before being killed
  • The orchestrator writes partial TAP and responses files before re-throwing, so CI artifacts capture whatever completed
  • Fixes the scenario where 3 of 4 runs finish but a single timeout causes all data to be discarded

How it works

  • execute-agent.js: spawnProcess and collectProcessOutput share a single partialOutput buffer for stdout. On timeout, the buffer contents are attached as partialStdout on the TimeoutError, and the orphaned child process is killed.
  • ai-runner.js: executeRuns tracks completed run results as each run settles. If a run times out, its partial stdout is captured. runAITests builds partialResults from all available data: completed runs' full responses, plus the timed-out run's partial output appended with a [RITEWAY TIMEOUT] Agent was terminated after Xms. Output above is partial. marker. This is attached to the re-thrown error.
  • ai-command.js: The catch block checks for partialResults on caught errors and calls recordTestOutput before re-throwing. testFilename hoisted out of try so it's accessible in catch. The original error type is preserved so handleAIErrors still routes correctly.
  • README.md: Documents partial results on timeout behavior.

Test plan

  • All 229 tests pass (4 new)
  • Existing tests unaffected
  • New test: partial results + timeout marker when run 1 completes but run 2 times out
  • New test: partial stdout + timeout marker captured when all runs time out
  • New test: recordTestOutput called with partial results before re-throw
  • New test: recordTestOutput not called when error has no partial results
  • Self-review: eliminated duplicate stdout listener, verified error-causes routing

@janhesters janhesters force-pushed the feat/partial-results-on-timeout branch 3 times, most recently from aaee6ec to 6987af3 Compare March 22, 2026 14:08
@janhesters janhesters force-pushed the feat/partial-results-on-timeout branch from 6987af3 to b9d710a Compare March 22, 2026 14:10
@ericelliott
Copy link
Copy Markdown
Collaborator

@cursoragent please /aidd-review

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 22, 2026

You need to increase your spend limit or enable usage-based billing to run background agents. Go to Cursor

@ericelliott
Copy link
Copy Markdown
Collaborator

@cursoragent please /aidd-review

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 22, 2026

Now I'll compile my comprehensive code review following the project's review process.

🔬 Code Review: Partial Results on Timeout

🎯 Restate

This PR implements saving partial results when AI test runs time out, preserving completed run data and capturing partial output from timed-out runs with a clear [RITEWAY TIMEOUT] marker.


📊 Hotspot Analysis

⚠️ All three modified files are in the top hotspots:

  1. source/ai-command.js - Score 15972 (HIGHEST) - churn: 3, complexity: 22

    • This is the single highest-risk file in the codebase
    • Changes here have the highest blast radius
    • Extra scrutiny warranted
  2. source/ai-runner.js - Score 7530 (3rd) - churn: 3, complexity: 10

    • High churn indicates frequent changes
    • Moderate complexity means changes carry moderate comprehension risk
  3. source/execute-agent.js - Score 3204 (8th) - churn: 2, complexity: 9

    • Lower risk but still in top 10 hotspots

Recommendation: These files are prime candidates for refactoring to reduce complexity. Consider extracting cohesive sub-modules, especially from ai-command.js (22 complexity score).


✅ Test Coverage

Status: Excellent

  • All 229 tests pass (4 new tests added)
  • New tests comprehensively cover the feature:
    • ✓ Partial results when run 1 completes, run 2 times out
    • ✓ Partial stdout capture when all runs time out
    • recordTestOutput called with partialResults before re-throw
    • recordTestOutput not called when no partialResults present
  • Tests use realistic mock scenarios with actual timeouts
  • Proper test isolation using temporary directories

💡 Code Quality Assessment

Strengths

1. Clean Separation of Concerns

const collectProcessOutput = (proc, partialOutput) =>
  new Promise((resolve, reject) => {
    let stderr = '';

    proc.stdout.on('data', (data) => { partialOutput.stdout += data.toString(); });
    proc.stderr.on('data', (data) => { stderr += data.toString(); });
    proc.on('close', (code) => { resolve({ stdout: partialOutput.stdout, stderr, code }); });
    proc.on('error', reject);
  });

The shared partialOutput buffer design is elegant - it enables both normal execution and timeout scenarios to access the same accumulating stdout buffer.

2. Proper Error Propagation

  let result;
  try {
    result = await withTimeout(
      promise,
      timeout,
      () => ({
        ...TimeoutError,
        message: `Agent process timed out after ${timeout}ms. Command: ${command} ${args.join(' ')}`,
        command,
        args: args.join(' '),
        timeout,
        partialStdout: partialOutput.stdout
      })
    );
  } catch (error) {
    try { proc.kill(); } catch { /* best-effort */ }
    throw error;
  }

Timeout errors include partialStdout in the error object, and the process is properly killed (with best-effort error handling).

3. Robust Aggregation Logic

  try {
    const allRunResults = await runsPromise;
    const aggregated = aggregateResults({ assertions, allRunResults, threshold, runs });
    const responses = allRunResults.map(({ response }) => response);
    return { ...aggregated, responses };
  } catch (error) {
    const timedOutPartialStdout = getTimedOutPartialStdout();
    const hasPartialData = completedRuns.length > 0 || timedOutPartialStdout !== undefined;

    if (hasPartialData) {
      const responses = completedRuns.map(({ response }) => response);

      if (timedOutPartialStdout !== undefined) {
        const timeoutMs = error.cause?.timeout ?? timeout;
        responses.push(
          timedOutPartialStdout +
          `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`
        );
      }

      const aggregated = completedRuns.length > 0
        ? aggregateResults({
          assertions,
          allRunResults: completedRuns,
          threshold,
          runs: completedRuns.length
        })
        : { passed: false, assertions: [] };

      throw createError({
        name: error.cause?.name ?? error.name,
        code: error.cause?.code ?? error.code,
        message: error.cause?.message ?? error.message,
        partialResults: { ...aggregated, responses },
        cause: error
      });
    }

    throw error;
  }

Handles multiple edge cases:

  • Some runs complete, others timeout
  • All runs timeout
  • No partial data available
  • Preserves original error name/code for proper routing

4. Proper Variable Hoisting

  const testFilename = basename(filePath);

  try {

The testFilename is correctly hoisted outside the try block so it's accessible in the catch block for writing partial results.


🪞 Areas for Improvement

1. Variable Naming Clarity

In executeRuns:

  let timedOutPartialStdout;

Issue: This captures only the last timed-out run's stdout, not all of them. Name doesn't convey the "last one wins" semantics.

Suggestion: Rename to lastTimedOutStdout for clarity.

2. Closure Documentation

    getTimedOutPartialStdout: () => timedOutPartialStdout

Issue: The closure pattern isn't immediately obvious from the function signature.

Suggestion: Add a JSDoc comment:

/**
 * Returns the most recently captured partial stdout from any timed-out run.
 * @returns {string|undefined} Partial output or undefined if no timeouts occurred
 */
getTimedOutPartialStdout: () => timedOutPartialStdout

3. Comment Verbosity

 * TODO (follow-up): console.log calls are mixed with orchestration here. A future
 * refactor could return a structured result and let the CLI layer own all output,
 * aligning with the saga pattern's IO-separation principle. Not critical for this PR.

Issue: While the TODO is helpful, it's overly detailed for what's essentially "TODO: separate IO from orchestration logic"

Suggestion: Condense to:

 * TODO: Separate console.log calls from orchestration (saga pattern IO-separation)

4. Magic String Format

The [RITEWAY TIMEOUT] marker is hardcoded in multiple places:

        responses.push(
          timedOutPartialStdout +
          `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`
        );

Suggestion: Extract to a constant at the module level:

const TIMEOUT_MARKER_TEMPLATE = (timeoutMs) => 
  `\n\n---\n[RITEWAY TIMEOUT] Agent was terminated after ${timeoutMs}ms. Output above is partial.\n---\n`;

🔭 Architectural Observations

Positive Patterns

  1. Error-Causes Pattern: Consistent use throughout maintains proper error routing
  2. Functional Composition: executeRuns returns both a promise and getters - clean side-effect tracking
  3. Best-Effort Cleanup: Process killing and partial output recording fail gracefully
  4. Clear Separation: Process management → aggregation → output recording pipeline is well-defined

Future Considerations

The high complexity score (22) in ai-command.js suggests the orchestration function runAICommand is doing too much. Consider extracting:

  1. Authentication verification
  2. Result formatting and console output
  3. Partial results handling

Into separate composed functions to reduce complexity and improve testability.


🔒 Security Review

OWASP Top 10 Check: ✅ No violations detected

  • No SQL injection vectors
  • No XSS vulnerabilities
  • No authentication/authorization bypasses
  • No sensitive data exposure
  • No CSRF concerns
  • Process spawning uses validated agent configs
  • Path traversal already prevented by existing validateFilePath (line 143)

Security Note: The proc.kill() on timeout (line 130 in execute-agent.js) is appropriate for orphaned child processes.


⚖️ Adherence to Standards

JavaScript Best Practices ✅

  • ✓ Functional programming style maintained
  • ✓ Proper use of const/destructuring
  • ✓ Clear function names (verbs)
  • ✓ Avoids mutation except for necessary accumulation buffers
  • ✓ Clean error handling with error-causes library
  • ✓ No unnecessary intermediate variables

TDD Best Practices ✅

  • ✓ Tests answer the 5 questions (unit, behavior, actual, expected, debugging)
  • ✓ Tests are isolated with proper cleanup
  • ✓ Tests are thorough (edge cases covered)
  • ✓ Tests are explicit (factories used, no shared fixtures)

Documentation ✅

  • README.md updated with clear explanation of partial results behavior
  • Docblocks present for public API (executeAgent, runAITests, runAICommand)
  • Inline comments are meaningful, not redundant

💬 Final Recommendations

Must Address Before Merge

None - code is production-ready.

Should Consider (Non-blocking)

  1. Refactor ai-command.js to reduce complexity from 22 → ~15

    • Extract authentication, formatting, and partial-results logic
    • Would reduce blast radius of future changes
  2. Extract timeout marker constant to eliminate magic strings

  3. Rename timedOutPartialStdoutlastTimedOutStdout

  4. Add JSDoc to closure in executeRuns return value

Overall Assessment

Status: ✅ APPROVE

This is a high-quality implementation that:

  • Solves a real problem (data loss on timeout)
  • Has comprehensive test coverage (4 new tests, all pass)
  • Follows project conventions consistently
  • Handles edge cases properly
  • Is well-documented

The changes touch hotspot files but are minimally invasive and well-isolated. The timeout handling logic is elegant and maintainable.

Great work! The only concerns are minor naming/documentation improvements and the ongoing need to reduce complexity in ai-command.js (though that's a broader refactoring concern, not specific to this PR).

Open in Web Open in Cursor 

Copy link
Copy Markdown
Collaborator

@ianwhitedeveloper ianwhitedeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ APPROVE

Solid PR. Solves a real problem (data loss on timeout), has comprehensive test coverage (4 new tests, 229 total pass), and follows the project's error-causes and functional patterns cleanly.

Key Findings

The completedRuns tracking is more general than the PR title suggests — it captures completed runs for any error type (not just timeouts), which also guards against scenarios like the Cursor CLI ENOENT race condition where a stray process failure discards all results.

One race condition under concurrency > 1: completedRuns is read in the catch block immediately when Promise.all rejects, but other runs may still be in-flight. With concurrency: 4, the ENOENT fires during the concurrent judge burst before any run fully completes, so completedRuns is empty and no partial results are captured. Fix: await Promise.allSettled(runPromises) before reading completedRuns. (Tested locally — all 229 tests pass with this fix.)

Unbounded judge concurrency: Within each run, Promise.all spawns all judge calls simultaneously. With 4 runs × N assertions, this can produce 4×N concurrent agent processes — the direct trigger for the Cursor CLI's cli-config.json rename race. Throttling judge calls with limitConcurrency(judgeTasks, concurrency) reduces the blast radius.

Both enhancements are uncommitted on my local branch and available for a follow-up PR if desired.

Minor non-blocking suggestions
  • CI flakiness risk: Integration tests use 2s timeouts with real process spawning. Monitor CI; increase to 5s if flakes appear.
  • TODO comment verbosity: The TODO (follow-up) block in ai-command.js could be condensed to a single line: TODO: Separate console.log calls from orchestration (saga pattern IO-separation).
  • completedRuns ordering: Under concurrency > 1, the responses array order depends on which runs finish first, not run index. Acceptable for diagnostic output but worth noting.
OWASP Top 10 & security check

All clear. No injection vectors (spawn uses array args), no auth changes, no secrets in diff, no SSRF. Path traversal prevented by existing validateFilePath. proc.kill() on timeout is appropriate with best-effort error handling.

Hotspot analysis

All three modified source files are top-10 hotspots (ai-command.js #1 at 15972, ai-runner.js #3 at 7530, execute-agent.js #8 at 3204). Changes are well-scoped and don't meaningfully increase complexity. The long-term recommendation to extract orchestration sub-functions from ai-command.js (complexity 22) still stands as future work.

When some runs complete before another times out, attach the completed
runs' data as partialResults on the error. The orchestrator writes
these to disk before re-throwing, so CI artifacts capture whatever
completed even on timeout failures.
@janhesters janhesters force-pushed the feat/partial-results-on-timeout branch from 28754b8 to 2c3eb63 Compare March 23, 2026 16:46
@janhesters janhesters merged commit 3859725 into master Mar 23, 2026
2 checks passed
@janhesters janhesters deleted the feat/partial-results-on-timeout branch March 23, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants