feat(ai): add --save-responses flag by janhesters · Pull Request #428 · paralleldrive/riteway

janhesters · 2026-03-17T19:02:33Z

Summary

Adds --save-responses boolean CLI flag (defaults to false) to riteway ai
When enabled, writes a companion .responses.md file alongside the .tap.md output containing raw result agent responses and per-run judge details for each assertion
Enables CI artifact-based debugging — upload the responses file as a build artifact to inspect exactly what the agent produced when assertions fail, without adding console noise

Changes

constants.js: Added saveResponses: false default
ai-runner.js: executeSingleRun now returns { judgments, response } so raw result agent output is preserved; runAITests returns responses array alongside aggregated results
ai-command.js: Parses --save-responses flag and threads it through to recordTestOutput
test-output.js: Added formatResponses() and companion file writing in recordTestOutput when saveResponses is true
Tests: 8 new tests covering flag parsing, response formatting, companion file creation/absence, and E2E raw response capture

Test plan

All 220 tests pass (8 new)
Manual test with riteway ai --save-responses test.sudo to verify companion file is created
Verify companion file is NOT created when flag is omitted

ianwhitedeveloper

Review

Great feature — clean implementation, well-scoped diff, and the README/CI example are genuinely useful. The hotspot analysis flags ai-command.js (#1, score 8100) and test-output.js (#4) as high-churn files, but the changes here are appropriately minimal and don't increase complexity in either. A few things to address:

Required

1. --save-responses is missing from riteway ai --help

bin/riteway.js was not updated. The flag is absent from three places:

a) The AI Test Options block (line 194) — add after --color:

  --save-responses          Save raw agent responses and judge details to a companion .responses.md file

b) The ValidationError handler usage hint (lines 87–93) — the inline synopsis on line 87 should include [--save-responses], and a corresponding console.error line should follow --color, using defaults.saveResponses for the default (consistent with how the other flags reference defaults.*):

console.error(`  --save-responses       Save raw agent responses to a companion .responses.md file (default: ${defaults.saveResponses})`);

c) The Examples section (lines 205–213) — add one entry consistent with the --color example already there:

  riteway ai prompts/test.sudo --save-responses

Anyone using --help to discover flags will not find --save-responses. The README is well documented, but the CLI is the first place users look.

Minor

4. --save-responses not reflected in the configuration log

When the flag is active, the terminal output gives no indication:

console.log(`Configuration: ${runs} runs, ${threshold}% threshold, ...`);

A user debugging a CI failure who forgot they enabled the flag won't see it. Suggest appending something like , responses: ${saveResponses ? 'saving' : 'off'} (or however it fits the existing format).

5. Brittle path derivation in one of the new tests

In test-output.test.js, the companion file path is derived as:

const responsesPath = outputPath.replace('.tap.md', '.responses.md');

The sibling tests in the same file correctly use readdirSync(testDir).filter(f => f.includes('.responses.md')) to locate the file by pattern. This one test is coupled to the internal extension convention — if that changes, it would silently test the wrong path. Worth aligning to the same pattern.

Optional

6. No E2E test for the full CLI path

The flag's path from parseAIArgs(['--save-responses', ...]) → runAICommand → file-on-disk is well-covered at each unit layer, but e2e.test.js wasn't updated. Manual verification is noted in the test plan, but an automated E2E test would permanently close this gap.

Not a blocker, but worth noting

test-output.test.js is now at ~1,050 LoC with 16% gzip density — a pre-existing smell, not introduced by this PR. As a follow-up, splitting formatResponses tests into format-responses.test.js would reduce hotspot churn on a file that's already in the top 5. Good candidate for a dedicated cleanup PR.

When --save-responses is passed, riteway ai writes a companion .responses.md file alongside the .tap.md output containing the raw result agent responses and per-run judge details for each assertion. This enables CI artifact-based debugging without adding console noise.

ianwhitedeveloper

I also verified that --save-responses successfully outputs the expected files locally in the aidd repo

lgtm 🙌

janhesters force-pushed the feat/save-responses branch 2 times, most recently from 307640b to eb0d165 Compare March 17, 2026 19:16

janhesters requested a review from ianwhitedeveloper March 17, 2026 20:40

ianwhitedeveloper requested changes Mar 17, 2026

View reviewed changes

janhesters force-pushed the feat/save-responses branch from eb0d165 to 0cf467e Compare March 17, 2026 21:51

janhesters force-pushed the feat/save-responses branch from 0cf467e to 2e2c827 Compare March 17, 2026 21:53

janhesters requested a review from ianwhitedeveloper March 18, 2026 10:01

ianwhitedeveloper approved these changes Mar 18, 2026

View reviewed changes

janhesters merged commit db0ce9f into master Mar 18, 2026
2 checks passed

ianwhitedeveloper deleted the feat/save-responses branch March 18, 2026 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): add --save-responses flag#428

feat(ai): add --save-responses flag#428
janhesters merged 1 commit intomasterfrom
feat/save-responses

janhesters commented Mar 17, 2026

Uh oh!

ianwhitedeveloper left a comment

Uh oh!

ianwhitedeveloper left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janhesters commented Mar 17, 2026

Summary

Changes

Test plan

Uh oh!

ianwhitedeveloper left a comment

Choose a reason for hiding this comment

Review

Required

Recommended

Minor

Optional

Not a blocker, but worth noting

Uh oh!

ianwhitedeveloper left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants