Skip to content

feat(ai): add --save-responses flag#428

Merged
janhesters merged 1 commit intomasterfrom
feat/save-responses
Mar 18, 2026
Merged

feat(ai): add --save-responses flag#428
janhesters merged 1 commit intomasterfrom
feat/save-responses

Conversation

@janhesters
Copy link
Copy Markdown
Collaborator

Summary

  • Adds --save-responses boolean CLI flag (defaults to false) to riteway ai
  • When enabled, writes a companion .responses.md file alongside the .tap.md output containing raw result agent responses and per-run judge details for each assertion
  • Enables CI artifact-based debugging — upload the responses file as a build artifact to inspect exactly what the agent produced when assertions fail, without adding console noise

Changes

  • constants.js: Added saveResponses: false default
  • ai-runner.js: executeSingleRun now returns { judgments, response } so raw result agent output is preserved; runAITests returns responses array alongside aggregated results
  • ai-command.js: Parses --save-responses flag and threads it through to recordTestOutput
  • test-output.js: Added formatResponses() and companion file writing in recordTestOutput when saveResponses is true
  • Tests: 8 new tests covering flag parsing, response formatting, companion file creation/absence, and E2E raw response capture

Test plan

  • All 220 tests pass (8 new)
  • Manual test with riteway ai --save-responses test.sudo to verify companion file is created
  • Verify companion file is NOT created when flag is omitted

@janhesters janhesters force-pushed the feat/save-responses branch 2 times, most recently from 307640b to eb0d165 Compare March 17, 2026 19:16
Copy link
Copy Markdown
Collaborator

@ianwhitedeveloper ianwhitedeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Great feature — clean implementation, well-scoped diff, and the README/CI example are genuinely useful. The hotspot analysis flags ai-command.js (#1, score 8100) and test-output.js (#4) as high-churn files, but the changes here are appropriately minimal and don't increase complexity in either. A few things to address:


Required

1. --save-responses is missing from riteway ai --help

bin/riteway.js was not updated. The flag is absent from three places:

a) The AI Test Options block (line 194) — add after --color:

  --save-responses          Save raw agent responses and judge details to a companion .responses.md file

b) The ValidationError handler usage hint (lines 87–93) — the inline synopsis on line 87 should include [--save-responses], and a corresponding console.error line should follow --color, using defaults.saveResponses for the default (consistent with how the other flags reference defaults.*):

console.error(`  --save-responses       Save raw agent responses to a companion .responses.md file (default: ${defaults.saveResponses})`);

c) The Examples section (lines 205–213) — add one entry consistent with the --color example already there:

  riteway ai prompts/test.sudo --save-responses

Anyone using --help to discover flags will not find --save-responses. The README is well documented, but the CLI is the first place users look.


Recommended

2. runAITests @returns docblock is stale

The return shape now always includes responses, but the docblock still only says "Aggregated per-assertion test results". Since this is a public export, document the new key:

* @returns {Promise<{passed: boolean, assertions: Array<Object>, responses: string[]}>}

3. formatResponses — guard against non-string response

lines.push(response.trimEnd() + '\n\n');

formatResponses is a public export. If a future agent adapter ever returns null or undefined, this throws a cryptic TypeError instead of surfacing a meaningful error. A one-character fix closes the gap:

lines.push(String(response ?? '').trimEnd() + '\n\n');

Minor

4. --save-responses not reflected in the configuration log

When the flag is active, the terminal output gives no indication:

console.log(`Configuration: ${runs} runs, ${threshold}% threshold, ...`);

A user debugging a CI failure who forgot they enabled the flag won't see it. Suggest appending something like , responses: ${saveResponses ? 'saving' : 'off'} (or however it fits the existing format).


5. Brittle path derivation in one of the new tests

In test-output.test.js, the companion file path is derived as:

const responsesPath = outputPath.replace('.tap.md', '.responses.md');

The sibling tests in the same file correctly use readdirSync(testDir).filter(f => f.includes('.responses.md')) to locate the file by pattern. This one test is coupled to the internal extension convention — if that changes, it would silently test the wrong path. Worth aligning to the same pattern.


Optional

6. No E2E test for the full CLI path

The flag's path from parseAIArgs(['--save-responses', ...])runAICommand → file-on-disk is well-covered at each unit layer, but e2e.test.js wasn't updated. Manual verification is noted in the test plan, but an automated E2E test would permanently close this gap.


Not a blocker, but worth noting

test-output.test.js is now at ~1,050 LoC with 16% gzip density — a pre-existing smell, not introduced by this PR. As a follow-up, splitting formatResponses tests into format-responses.test.js would reduce hotspot churn on a file that's already in the top 5. Good candidate for a dedicated cleanup PR.

@janhesters janhesters force-pushed the feat/save-responses branch from eb0d165 to 0cf467e Compare March 17, 2026 21:51
When --save-responses is passed, riteway ai writes a companion
.responses.md file alongside the .tap.md output containing the raw
result agent responses and per-run judge details for each assertion.
This enables CI artifact-based debugging without adding console noise.
Copy link
Copy Markdown
Collaborator

@ianwhitedeveloper ianwhitedeveloper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also verified that --save-responses successfully outputs the expected files locally in the aidd repo

lgtm 🙌

@janhesters janhesters merged commit db0ce9f into master Mar 18, 2026
2 checks passed
@ianwhitedeveloper ianwhitedeveloper deleted the feat/save-responses branch March 18, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants