Conversation
307640b to
eb0d165
Compare
ianwhitedeveloper
left a comment
There was a problem hiding this comment.
Review
Great feature — clean implementation, well-scoped diff, and the README/CI example are genuinely useful. The hotspot analysis flags ai-command.js (#1, score 8100) and test-output.js (#4) as high-churn files, but the changes here are appropriately minimal and don't increase complexity in either. A few things to address:
Required
1. --save-responses is missing from riteway ai --help
bin/riteway.js was not updated. The flag is absent from three places:
a) The AI Test Options block (line 194) — add after --color:
--save-responses Save raw agent responses and judge details to a companion .responses.md file
b) The ValidationError handler usage hint (lines 87–93) — the inline synopsis on line 87 should include [--save-responses], and a corresponding console.error line should follow --color, using defaults.saveResponses for the default (consistent with how the other flags reference defaults.*):
console.error(` --save-responses Save raw agent responses to a companion .responses.md file (default: ${defaults.saveResponses})`);c) The Examples section (lines 205–213) — add one entry consistent with the --color example already there:
riteway ai prompts/test.sudo --save-responses
Anyone using --help to discover flags will not find --save-responses. The README is well documented, but the CLI is the first place users look.
Recommended
2. runAITests @returns docblock is stale
The return shape now always includes responses, but the docblock still only says "Aggregated per-assertion test results". Since this is a public export, document the new key:
* @returns {Promise<{passed: boolean, assertions: Array<Object>, responses: string[]}>}3. formatResponses — guard against non-string response
lines.push(response.trimEnd() + '\n\n');formatResponses is a public export. If a future agent adapter ever returns null or undefined, this throws a cryptic TypeError instead of surfacing a meaningful error. A one-character fix closes the gap:
lines.push(String(response ?? '').trimEnd() + '\n\n');Minor
4. --save-responses not reflected in the configuration log
When the flag is active, the terminal output gives no indication:
console.log(`Configuration: ${runs} runs, ${threshold}% threshold, ...`);A user debugging a CI failure who forgot they enabled the flag won't see it. Suggest appending something like , responses: ${saveResponses ? 'saving' : 'off'} (or however it fits the existing format).
5. Brittle path derivation in one of the new tests
In test-output.test.js, the companion file path is derived as:
const responsesPath = outputPath.replace('.tap.md', '.responses.md');The sibling tests in the same file correctly use readdirSync(testDir).filter(f => f.includes('.responses.md')) to locate the file by pattern. This one test is coupled to the internal extension convention — if that changes, it would silently test the wrong path. Worth aligning to the same pattern.
Optional
6. No E2E test for the full CLI path
The flag's path from parseAIArgs(['--save-responses', ...]) → runAICommand → file-on-disk is well-covered at each unit layer, but e2e.test.js wasn't updated. Manual verification is noted in the test plan, but an automated E2E test would permanently close this gap.
Not a blocker, but worth noting
test-output.test.js is now at ~1,050 LoC with 16% gzip density — a pre-existing smell, not introduced by this PR. As a follow-up, splitting formatResponses tests into format-responses.test.js would reduce hotspot churn on a file that's already in the top 5. Good candidate for a dedicated cleanup PR.
eb0d165 to
0cf467e
Compare
When --save-responses is passed, riteway ai writes a companion .responses.md file alongside the .tap.md output containing the raw result agent responses and per-run judge details for each assertion. This enables CI artifact-based debugging without adding console noise.
0cf467e to
2e2c827
Compare
ianwhitedeveloper
left a comment
There was a problem hiding this comment.
I also verified that --save-responses successfully outputs the expected files locally in the aidd repo
lgtm 🙌
Summary
--save-responsesboolean CLI flag (defaults tofalse) toriteway ai.responses.mdfile alongside the.tap.mdoutput containing raw result agent responses and per-run judge details for each assertionChanges
constants.js: AddedsaveResponses: falsedefaultai-runner.js:executeSingleRunnow returns{ judgments, response }so raw result agent output is preserved;runAITestsreturnsresponsesarray alongside aggregated resultsai-command.js: Parses--save-responsesflag and threads it through torecordTestOutputtest-output.js: AddedformatResponses()and companion file writing inrecordTestOutputwhensaveResponsesis trueTest plan
riteway ai --save-responses test.sudoto verify companion file is created