Merged
Conversation
Diagnoses two regressions found in user feedback comparing MAP to Codex/Superpowers on a long-running tool integration task: - /map-plan and task-decomposer have no durability/persistence checks for async/long-running operations — Codex caught in-memory state for a 5-minute call, MAP did not. - The decomposer ignored an explicit user instruction to ask clarifying questions, because heuristic skip conditions in /map-plan Step 1 silently override explicit user intent. Plan ranks fixes P0..P3 with file paths, anchors, and acceptance criteria. Reads Claude Code docs to harvest underused features (UserPromptSubmit hook, model/effort frontmatter, skills auto-trigger).
P0.1 — Override (always wins) at top of Step 1: explicit clarification language (English+Russian) forces Step 2 interview regardless of any heuristic skip signal. Lifts the 'silently ignored ask if unclear' failure structurally. P0.2 — Step 2b skip rewritten: drop length/subtask-count gates, keep only 'no cross-cutting concerns'. Cross-cutting now explicitly includes durability, async >30s, polling, webhooks, queueing. Always-run list extended with regex on async/durability terms. P0.3 — 8th interview dimension 'Durability & State Lifecycle': for any operation >5s, where does state live, does it survive restart, what's the resume identifier. In-process memory is not a valid default answer. P0.5 — New learned rule in architecture-patterns.md: 'Long-Running Operations Need Durable State by Default'. Codifies the durability default with WRONG/CORRECT examples.
…gers New non-blocking hook fires on every user prompt and emits hookSpecificOutput.additionalContext when: - Explicit clarification-invitation language is present (bilingual: 'ask if unclear', 'do not assume', 'если что-то непонятно', 'спрашивай', etc.) - Async/long-running language is present (any kind word: async, webhook, polling, асинхронн, в фоне, etc., OR a significant duration: minutes, hours, or >=30 seconds; bilingual) Both paths are non-blocking (exit 0). Per Claude Code docs, only exit code 2 blocks — informational hooks must use additionalContext, not exit codes. Acts as a structural backstop to the in-prompt P0.1 rule: even if a future prompt edit weakens the in-prompt override, the hook will keep nudging the planner toward Step 2 + Devil's Advocate + Durability Audit. Includes 28 unit tests covering original failure case (English + Russian), positive triggers, negative tests (no false positives on small numbers, unrelated prompts), and robustness (malformed JSON, empty prompt, alternate field names).
task-decomposer.md gets three additions:
* P0.4 — 'Durability Audit' sub-checklist that fires when any
subtask description matches async/long-running language; checks
every state element's storage, durability across restarts, and
resume identifier semantics.
* P1.2 — 'Mid-Decomposition Clarification (AskUserQuestion)' — a
third path between 'full plan' and 'refuse with questions'.
Authorizes ONE targeted AskUserQuestion call for architecturally
load-bearing questions only (durability, fan-out, sync vs async),
explicitly disallowed for naming/styling. Notes foreground vs
background subagent semantics.
* P2.1 + P2.2 — frontmatter: model: opus, effort: high,
permissionMode: plan. Decomposition is the load-bearing decision
in MAP; the user feedback that triggered this work observed that
Codex on medium beat Claude on default-sonnet on a planning task.
Other agents:
* final-verifier: model: opus, effort: high (last gate before merge)
* monitor: effort: high (kept on sonnet — adversarial review
quality scales with effort more than model strength)
* evaluator: effort: high (weighted scoring needs deliberation
budget)
* research-agent: model: haiku (read-mostly, frees Opus/Sonnet
budget for load-bearing decision agents)
P2.3 — Audit of existing hook exit codes: every existing hook in
.claude/hooks/ already uses sys.exit(0) and delegates blocking to
stdout JSON (permissionDecision: deny). No fix needed; documented
the rule of the road in a new .claude/hooks/README.md so future
hook authors don't accidentally use exit 1 to block (silently
fails per Claude Code docs).
P3.1 — Migrate 10 slash commands to Skills format. Per Claude Code docs, /commands/<name>.md and /skills/<name>/SKILL.md are functionally merged: both register the same /<name> slash command. Skills win precedence when both exist. The migration buys progressive disclosure: skill descriptions stay in context, body loads on demand. Each new SKILL.md has skill-compliant frontmatter (name, description ≤250 chars with Use when/Do NOT use phrases, disable-model-invocation, argument-hint), the original command body verbatim, plus required Examples and Troubleshooting sections. Originals at .claude/commands/map-*.md are preserved as fallback — removing them is deferred to a separate PR after we observe the skills version in use. Tests: tests/test_skills.py runs 18 strict checks per skill; all green. tests/test_command_templates.py also green. P3.2 — Verified the Claude Code docs claim that subagents cannot spawn other subagents. actor.md contains a contradictory pair of sections (one tells actor to call research-agent via Task(); the other says research is pre-run by orchestrator). The contradiction is documented in the improvement plan; cleanup deferred to a separate PR to keep this batch focused. chore: sync re-mirrored .map/scripts/*.py into src/mapify_cli/templates/map/scripts/. This was pre-existing template drift on main (the templates were already behind .map/scripts/ before this branch) — running scripts/sync-templates.sh as part of this work brought them back in line. NOT a behavioral change here.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the MAP framework templates to address planning failures around (1) user-invited clarification being ignored and (2) long-running/async work defaulting to unsafe in-process state, by adding prompt-trigger detection and expanding workflow documentation/skills.
Changes:
- Add a
UserPromptSubmithook (detect-clarification-triggers.py) plus pytest coverage to inject clarification/durability guidance into Claude’s context. - Expand
/map-planandtask-decomposerguidance to force deep interview on explicit “ask if unclear” language and to treat durability/async as cross-cutting. - Add several new skills (
/map-check,/map-task,/map-tdd,/map-review,/map-resume,/map-debug,/map-fast) and update skill trigger rules and agent frontmatter (model/effort/permissions).
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/hooks/test_detect_clarification_triggers.py | New end-to-end tests for the UserPromptSubmit trigger hook behavior. |
| src/mapify_cli/templates/skills/skill-rules.json | Adds trigger rules for new MAP skills/commands. |
| src/mapify_cli/templates/skills/map-tdd/SKILL.md | Adds TDD workflow skill documentation. |
| src/mapify_cli/templates/skills/map-task/SKILL.md | Adds single-subtask execution skill documentation. |
| src/mapify_cli/templates/skills/map-review/SKILL.md | Adds structured interactive review workflow documentation. |
| src/mapify_cli/templates/skills/map-resume/SKILL.md | Adds workflow recovery skill documentation. |
| src/mapify_cli/templates/skills/map-fast/SKILL.md | Adds minimal workflow skill documentation. |
| src/mapify_cli/templates/skills/map-debug/SKILL.md | Adds debugging workflow skill documentation. |
| src/mapify_cli/templates/skills/map-check/SKILL.md | Adds quality gates + verification workflow skill documentation. |
| src/mapify_cli/templates/settings.json | Wires the new UserPromptSubmit hook into template settings. |
| src/mapify_cli/templates/map/scripts/map_utils.py | Removes unused constraint parsing helpers; keeps branch-name utility. |
| src/mapify_cli/templates/map/scripts/map_step_runner.py | Refactors formatting and changes plan artifact readiness tracking. |
| src/mapify_cli/templates/map/scripts/map_orchestrator.py | Refactors resume logic and state schema surface related to planning/bootstrap. |
| src/mapify_cli/templates/map/scripts/diagnostics.py | Minor formatting-only change. |
| src/mapify_cli/templates/hooks/detect-clarification-triggers.py | New hook implementation for clarification + durability trigger detection. |
| src/mapify_cli/templates/hooks/README.md | Adds hook conventions/exit-code documentation. |
| src/mapify_cli/templates/commands/map-plan.md | Updates planning heuristics: clarification override + durability interview + stricter Devil’s Advocate gating. |
| src/mapify_cli/templates/agents/task-decomposer.md | Updates decomposer config + adds durability audit and AskUserQuestion guidance. |
| src/mapify_cli/templates/agents/research-agent.md | Pins model/version metadata updates for research agent. |
| src/mapify_cli/templates/agents/monitor.md | Adds higher effort metadata for monitor. |
| src/mapify_cli/templates/agents/final-verifier.md | Pins verifier to opus + high effort metadata. |
| src/mapify_cli/templates/agents/evaluator.md | Adds higher effort metadata for evaluator. |
| docs/improvement-plan-2026-04-28.md | Documents the motivating incident and proposed framework improvements. |
| .claude/skills/skill-rules.json | Mirrors skill rule additions into .claude/. |
| .claude/skills/map-tdd/SKILL.md | Mirrors map-tdd skill into .claude/. |
| .claude/skills/map-task/SKILL.md | Mirrors map-task skill into .claude/. |
| .claude/skills/map-review/SKILL.md | Mirrors map-review skill into .claude/. |
| .claude/skills/map-resume/SKILL.md | Mirrors map-resume skill into .claude/. |
| .claude/skills/map-fast/SKILL.md | Mirrors map-fast skill into .claude/. |
| .claude/skills/map-debug/SKILL.md | Mirrors map-debug skill into .claude/. |
| .claude/skills/map-check/SKILL.md | Mirrors map-check skill into .claude/. |
| .claude/settings.json | Enables UserPromptSubmit hook in project settings. |
| .claude/rules/learned/architecture-patterns.md | Adds learned rule about durable state for long-running ops. |
| .claude/hooks/detect-clarification-triggers.py | Adds the hook to .claude/hooks/. |
| .claude/hooks/README.md | Adds hook conventions README to .claude/hooks/. |
| .claude/commands/map-plan.md | Mirrors /map-plan heuristic updates into .claude/. |
| .claude/agents/task-decomposer.md | Mirrors decomposer updates into .claude/. |
| .claude/agents/research-agent.md | Mirrors research-agent metadata updates into .claude/. |
| .claude/agents/monitor.md | Mirrors monitor metadata updates into .claude/. |
| .claude/agents/final-verifier.md | Mirrors final-verifier metadata updates into .claude/. |
| .claude/agents/evaluator.md | Mirrors evaluator metadata updates into .claude/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
1010
to
1015
| if step_state_path.exists(): | ||
| plan_artifacts.append(_artifact_ref(step_state_path, "step-state")) | ||
|
|
||
| if task_plan_path.exists() and blueprint_path.exists() and handoff_path.exists(): | ||
| if task_plan_path.exists() and blueprint_path.exists() and step_state_path.exists(): | ||
| plan_status = "ready" | ||
| elif plan_artifacts: |
Comment on lines
+1533
to
+1537
| """Resume workflow from an existing /map-plan output, skipping init phases. | ||
|
|
||
| Detects task_plan_<branch>.md and step_state.json created by /map-plan. | ||
| Extracts subtask IDs from the plan, marks init phases as completed, and | ||
| starts execution from INIT_STATE (batch mode auto-set). |
Comment on lines
+1567
to
+1576
| # Extract AAG contracts from step_state.json or blueprint.json if present | ||
| aag_contracts: dict[str, str] = {} | ||
| step_state_file = plan_dir / "step_state.json" | ||
| blueprint_file = plan_dir / "blueprint.json" | ||
| for source_file in [step_state_file, blueprint_file]: | ||
| if source_file.exists() and not aag_contracts: | ||
| try: | ||
| src_data = json.loads(source_file.read_text(encoding="utf-8")) | ||
| aag_contracts = src_data.get("aag_contracts", {}) | ||
| except (json.JSONDecodeError, KeyError): |
Comment on lines
+62
to
+66
| # Two groups for scoring: | ||
| # "kind" — operation kind / verb that is itself ambiguous about state | ||
| # "duration" — explicit duration markers ("5 min", "30 seconds") | ||
| # A single duration alone does not fire; we want at least one "kind" | ||
| # match OR multiple duration markers. |
…igration The skills migration (commit 6baff28) shipped both .claude/commands/map-*.md AND .claude/skills/map-*/SKILL.md as a 'safety net' — but per Claude Code docs, skills win precedence when both exist. Keeping the commands as a fallback creates two sources of truth: edit a skill, the command file silently keeps the old version. Removes: - 10 .claude/commands/map-*.md (canonical source) - 10 src/mapify_cli/templates/commands/map-*.md (mirror) - tests/test_command_templates.py (replaced by the new TestCommandTemplateSynchronization class in test_template_sync.py which asserts the inverse: NO map-*.md may reappear in commands/) - scripts/commit-improvements-2026-04-28.sh (leftover one-time helper) Updates: - src/mapify_cli/delivery/file_copier.py: create_commands_dir README now lists all /map-* entries as skill-backed and explains that the commands/ dir exists for user-custom commands only. - scripts/sync-templates.sh: handle empty .claude/commands/ via shopt nullglob guard so the script doesn't fail when there are no .md files to copy. - tests/test_template_sync.py: TestCommandTemplateSynchronization rewritten — asserts (a) no map-*.md remains in either commands directory and (b) any non-map (user-custom) commands stay byte- identical between .claude/commands/ and templates/commands/. User-custom commands placed in .claude/commands/ continue to work unchanged. Only the framework's own /map-* entries moved to skills.
…ration - Replace should_resume_from_plan test calls with resume_from_plan (function was removed in prior commit) - Remove test_write_plan_handoff (write_plan_handoff deleted) - Update record_plan_artifacts test to require step_state.json for "ready" status instead of plan_handoff.json - Fix map-efficient SKILL.md to call resume_from_plan directly instead of the removed should_resume_from_plan CLI command - Update CLI check command to report agents only (commands count is 0 after skills migration) - Update 3 CLI tests for commands→skills migration expectations
Review findings from /map-review (Monitor+Predictor+Evaluator): - Remove write_plan_handoff call from map-plan SKILL.md (all 4 copies) and all plan_handoff.json references — function was deleted but skill still invoked it, breaking /map-plan Step 7 for every user - Fix doctor/check commands count mismatch after skills migration: drop commands count comparison, show skills info instead - Delete stale test_plan_handoff_has_required_fields integration test - Fix misleading comment in detect-clarification-triggers.py about duration-only triggering behavior - Remove unused count_command_templates import from test_mapify_cli.py
After skills migration, create_command_files no longer needs to generate inline map-*.md command files. The fallback triggered in CI where the empty templates/commands/ dir isn't packaged in the wheel, causing test_create_command_files to fail (expected 0 commands, got 3). Now delegates entirely to create_commands_dir (README only).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.