feat(lvms): add /lvms:run-integration-tests skill for autonomous LVMS testing on TNF#208
Conversation
…ng on TNF Adds /lvms:run-integration-tests, a re-entrant skill that automates the full LVMS QE integration test pipeline on TNF clusters. Designed for RC/EC builds where the redhat-operators OLM catalog does not include the lvms-operator package. The skill uses a 3-phase state machine, detecting cluster state on each invocation: - Phase 1 (fresh): deploy from source, build binary, start nohup test run - Phase 2 (running): show progress, print monitoring command, exit - Phase 3 (done): parse OTE JSON results, generate Markdown report, post to JIRA Verified end-to-end on OCP 4.22.0 TNF cluster: 38/40 MNO tests passed across two runs. The 2 failures are cleanup race conditions in new upstream tests, not LVMS bugs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lucaconsalvi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughThe PR bumps LVMS plugin metadata and adds README and skill documentation for ChangesLVMS integration-test documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 10 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (10 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 markdownlint-cli2 (0.22.1)plugins/lvms/skills/run-integration-tests/SKILL.mdmarkdownlint-cli2 v0.22.1 (markdownlint v0.40.0) ... [truncated 1098 characters] ... node:internal/modules/esm/resolve:271:11) Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
plugins/lvms/skills/run-integration-tests/SKILL.md (1)
341-353: 🎯 Functional Correctness | 🔵 Trivial | ⚖️ Poor tradeoffError handling in a separate section instead of inline with each step.
Per repo learnings, failure policies and edge-case rules must be co-located with the specific step they govern — a separate
## Error Handlingtable is not reliably re-consulted by the LLM mid-execution.Move each row's action into a CRITICAL or Error Handling block immediately after the relevant step (e.g., the
make deployfailure note directly after Step 1c'smake deploycommand, LVMCluster timeout note directly after the wait loop, etc.).Based on learnings from
ggiguash(PR#192): "co-locate failure policies and edge-case rules inline with the specific step that needs them... Do not move these rules to a separate top-level 'Edge Cases' (or similar) section."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/lvms/skills/run-integration-tests/SKILL.md` around lines 341 - 353, The `run-integration-tests` skill keeps failure handling in a separate `## Error Handling` table, but the review asks to co-locate each rule with the step it governs. Move each action inline into the relevant step’s CRITICAL/Error Handling block in the main procedure (for example, the `make deploy` guidance near Step 1c, the LVMCluster timeout note near the wait loop, and the SSH/nohup and JIRA notes near their respective steps), and remove the standalone table so the rules stay attached to the matching step.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/lvms/.claude-plugin/plugin.json`:
- Line 4: The plugin version was bumped in plugin.json but the corresponding
marketplace version was not updated. When you change the version in the
.claude-plugin metadata, make the same version bump in marketplace.json as well,
using the plugin.json and marketplace.json entries to keep both files in sync so
update detection works correctly.
In `@plugins/lvms/README.md`:
- Around line 57-63: The README Requirements section is missing the
prerequisites needed to use run-integration-tests. Update the Requirements
content in plugins/lvms/README.md to include the same prerequisite set described
in SKILL.md so users see them before invoking /lvms:run-integration-tests: Go
1.24 or newer, access to a TNF cluster with extra disks, and SSH access to the
hypervisor. Keep the wording aligned with the existing README structure and the
run-integration-tests command examples.
In `@plugins/lvms/skills/run-integration-tests/SKILL.md`:
- Line 44: The default SSH host prompt in the run-integration-tests skill is
using a real hardcoded IP, which should be replaced with a generic placeholder.
Update the prompt text in SKILL.md for the Hypervisor SSH host default so it no
longer references `ec2-user@52.29.221.136`, and use a neutral example value
instead. Keep the change localized to the prompt string so the surrounding
workflow remains unchanged.
- Line 6: The run-integration-tests skill is missing the MCP Jira comment tool
from its allowed-tools list, so the JIRA posting step in the workflow will not
be permitted. Update the skill’s allowed-tools declaration to include the MCP
Atlassian Jira commenting tool referenced by the workflow instruction that calls
mcp__mcp-atlassian__jira_add_comment, keeping the existing Bash, Read, and
AskUserQuestion entries intact.
- Around line 155-159: The CSI driver verification step currently says not to
proceed but does not explicitly stop on failure; update the
run-integration-tests skill text around the CSI check to add a clear failure
path and message. In the section that runs the topolvm CSI driver check, make it
explicit that if topolvm.io is absent the process must stop, report that the CSI
driver is not registered, and point to the vg-manager logs for troubleshooting;
ensure the wording matches the SKILL-GUIDELINES requirement for an explicit stop
condition.
- Around line 226-235: The Phase 2 log check is using a hardcoded /root path
that does not match the ec2-user home used earlier, so the reader in the
integration-test flow will never find the log. Update the log-read step in
run-integration-tests SKILL.md to use the same home-relative path as Phase 1
(the SSH_HOST/ec2-user setup and the lvms-mno.log reference) so the Python
snippet reads the existing file consistently.
---
Nitpick comments:
In `@plugins/lvms/skills/run-integration-tests/SKILL.md`:
- Around line 341-353: The `run-integration-tests` skill keeps failure handling
in a separate `## Error Handling` table, but the review asks to co-locate each
rule with the step it governs. Move each action inline into the relevant step’s
CRITICAL/Error Handling block in the main procedure (for example, the `make
deploy` guidance near Step 1c, the LVMCluster timeout note near the wait loop,
and the SSH/nohup and JIRA notes near their respective steps), and remove the
standalone table so the rules stay attached to the matching step.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5000a9dc-5292-4baa-b4c3-6be578fc1773
📒 Files selected for processing (3)
plugins/lvms/.claude-plugin/plugin.jsonplugins/lvms/README.mdplugins/lvms/skills/run-integration-tests/SKILL.md
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Add mcp__mcp-atlassian__jira_add_comment to allowed-tools - Replace hardcoded IP with generic placeholder in SSH host prompt - Fix /root/ path to ~/ in Phase 2 and Phase 3 log reads - Add explicit STOP condition when topolvm.io CSI driver is missing - Add Go 1.24+ and TNF cluster requirements to README Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/lvms/skills/run-integration-tests/SKILL.md (1)
346-358: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick winInline the failure handling with the relevant steps.
Per SKILL-GUIDELINES, failure policies and edge cases should live next to the step they govern. This top-level
Error Handlingtable creates a second source of truth and will drift from the phase instructions.Based on learnings, co-locate failure policies and edge-case rules inline with the specific step that needs them, and do not move them to a separate top-level
Error Handlingsection.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/lvms/skills/run-integration-tests/SKILL.md` around lines 346 - 358, The top-level Error Handling table in the integration-test skill should be removed because it duplicates step-level policy and will drift from the workflow. Inline each failure policy next to the relevant step in the existing phase sections of SKILL.md, using the same symbols and step names already present there (for example, the SSH setup/run steps, make deploy, make integration-build, LVMCluster readiness checks, vg-manager log checks, and JIRA reporting), and keep the retry/remediation guidance attached to the step it governs rather than as a separate section.Sources: Path instructions, Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@plugins/lvms/skills/run-integration-tests/SKILL.md`:
- Around line 346-358: The top-level Error Handling table in the
integration-test skill should be removed because it duplicates step-level policy
and will drift from the workflow. Inline each failure policy next to the
relevant step in the existing phase sections of SKILL.md, using the same symbols
and step names already present there (for example, the SSH setup/run steps, make
deploy, make integration-build, LVMCluster readiness checks, vg-manager log
checks, and JIRA reporting), and keep the retry/remediation guidance attached to
the step it governs rather than as a separate section.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d697c8f8-6829-4571-8d99-2944000cee68
📒 Files selected for processing (2)
plugins/lvms/README.mdplugins/lvms/skills/run-integration-tests/SKILL.md
✅ Files skipped from review due to trivial changes (1)
- plugins/lvms/README.md
Add language specifiers to bare fenced code blocks (MD040) and blank lines around fences (MD031). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/lvms/skills/run-integration-tests/SKILL.md (1)
232-241: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winTighten the Phase 2 fallback.
except:turns every failure into "log not yet written", which hides permission/encoding issues and makes the progress state misleading.Proposed fix
- except: print('log not yet written') + except FileNotFoundError: + print('log not yet written') + except Exception as e: + print(f'log read failed: {e}')🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/lvms/skills/run-integration-tests/SKILL.md` around lines 232 - 241, The Phase 2 log check in the integration-test script uses a blanket except that turns all failures into “log not yet written,” masking real errors. Update the fallback in the ssh/python3 snippet to catch only the expected missing-file case and let permission, decode, or other read failures surface with a clearer message. Keep the fix localized to the log parsing block that reads ~/lvms-mno.log and counts result fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@plugins/lvms/skills/run-integration-tests/SKILL.md`:
- Around line 232-241: The Phase 2 log check in the integration-test script uses
a blanket except that turns all failures into “log not yet written,” masking
real errors. Update the fallback in the ssh/python3 snippet to catch only the
expected missing-file case and let permission, decode, or other read failures
surface with a clearer message. Keep the fix localized to the log parsing block
that reads ~/lvms-mno.log and counts result fields.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b1485f83-d9d5-4848-ab6c-703eadc2ad74
📒 Files selected for processing (1)
plugins/lvms/skills/run-integration-tests/SKILL.md
Summary
/lvms:run-integration-tests, a re-entrant Claude Code skill thatautomates the full LVMS QE integration test pipeline on TNF clusters
redhat-operatorsOLM catalog doesnot include the
lvms-operatorpackage (use/lvms:setup-prereqforreleased builds)
How it works
The skill uses a 3-phase state machine — invoke it once to start, invoke
it again when tests complete to collect results:
make deploy), appliesLVMCluster CR, builds test binary, starts MNO/SNO suite via
nohupoptionally posts to JIRA via MCP
Test plan
conditions in new upstream tests, not LVMS bugs)
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
/lvms:run-integration-tests).Documentation
Chores
1.1.0to1.2.0in the plugin manifest and marketplace metadata.