Feature/planning improvements 2026 04 28 by azalio · Pull Request #103 · azalio/map-framework

azalio · 2026-04-28T12:51:15Z

No description provided.

Diagnoses two regressions found in user feedback comparing MAP to Codex/Superpowers on a long-running tool integration task: - /map-plan and task-decomposer have no durability/persistence checks for async/long-running operations — Codex caught in-memory state for a 5-minute call, MAP did not. - The decomposer ignored an explicit user instruction to ask clarifying questions, because heuristic skip conditions in /map-plan Step 1 silently override explicit user intent. Plan ranks fixes P0..P3 with file paths, anchors, and acceptance criteria. Reads Claude Code docs to harvest underused features (UserPromptSubmit hook, model/effort frontmatter, skills auto-trigger).

P0.1 — Override (always wins) at top of Step 1: explicit clarification language (English+Russian) forces Step 2 interview regardless of any heuristic skip signal. Lifts the 'silently ignored ask if unclear' failure structurally. P0.2 — Step 2b skip rewritten: drop length/subtask-count gates, keep only 'no cross-cutting concerns'. Cross-cutting now explicitly includes durability, async >30s, polling, webhooks, queueing. Always-run list extended with regex on async/durability terms. P0.3 — 8th interview dimension 'Durability & State Lifecycle': for any operation >5s, where does state live, does it survive restart, what's the resume identifier. In-process memory is not a valid default answer. P0.5 — New learned rule in architecture-patterns.md: 'Long-Running Operations Need Durable State by Default'. Codifies the durability default with WRONG/CORRECT examples.

…gers New non-blocking hook fires on every user prompt and emits hookSpecificOutput.additionalContext when: - Explicit clarification-invitation language is present (bilingual: 'ask if unclear', 'do not assume', 'если что-то непонятно', 'спрашивай', etc.) - Async/long-running language is present (any kind word: async, webhook, polling, асинхронн, в фоне, etc., OR a significant duration: minutes, hours, or >=30 seconds; bilingual) Both paths are non-blocking (exit 0). Per Claude Code docs, only exit code 2 blocks — informational hooks must use additionalContext, not exit codes. Acts as a structural backstop to the in-prompt P0.1 rule: even if a future prompt edit weakens the in-prompt override, the hook will keep nudging the planner toward Step 2 + Devil's Advocate + Durability Audit. Includes 28 unit tests covering original failure case (English + Russian), positive triggers, negative tests (no false positives on small numbers, unrelated prompts), and robustness (malformed JSON, empty prompt, alternate field names).

task-decomposer.md gets three additions: * P0.4 — 'Durability Audit' sub-checklist that fires when any subtask description matches async/long-running language; checks every state element's storage, durability across restarts, and resume identifier semantics. * P1.2 — 'Mid-Decomposition Clarification (AskUserQuestion)' — a third path between 'full plan' and 'refuse with questions'. Authorizes ONE targeted AskUserQuestion call for architecturally load-bearing questions only (durability, fan-out, sync vs async), explicitly disallowed for naming/styling. Notes foreground vs background subagent semantics. * P2.1 + P2.2 — frontmatter: model: opus, effort: high, permissionMode: plan. Decomposition is the load-bearing decision in MAP; the user feedback that triggered this work observed that Codex on medium beat Claude on default-sonnet on a planning task. Other agents: * final-verifier: model: opus, effort: high (last gate before merge) * monitor: effort: high (kept on sonnet — adversarial review quality scales with effort more than model strength) * evaluator: effort: high (weighted scoring needs deliberation budget) * research-agent: model: haiku (read-mostly, frees Opus/Sonnet budget for load-bearing decision agents) P2.3 — Audit of existing hook exit codes: every existing hook in .claude/hooks/ already uses sys.exit(0) and delegates blocking to stdout JSON (permissionDecision: deny). No fix needed; documented the rule of the road in a new .claude/hooks/README.md so future hook authors don't accidentally use exit 1 to block (silently fails per Claude Code docs).

P3.1 — Migrate 10 slash commands to Skills format. Per Claude Code docs, /commands/<name>.md and /skills/<name>/SKILL.md are functionally merged: both register the same /<name> slash command. Skills win precedence when both exist. The migration buys progressive disclosure: skill descriptions stay in context, body loads on demand. Each new SKILL.md has skill-compliant frontmatter (name, description ≤250 chars with Use when/Do NOT use phrases, disable-model-invocation, argument-hint), the original command body verbatim, plus required Examples and Troubleshooting sections. Originals at .claude/commands/map-*.md are preserved as fallback — removing them is deferred to a separate PR after we observe the skills version in use. Tests: tests/test_skills.py runs 18 strict checks per skill; all green. tests/test_command_templates.py also green. P3.2 — Verified the Claude Code docs claim that subagents cannot spawn other subagents. actor.md contains a contradictory pair of sections (one tells actor to call research-agent via Task(); the other says research is pre-run by orchestrator). The contradiction is documented in the improvement plan; cleanup deferred to a separate PR to keep this batch focused. chore: sync re-mirrored .map/scripts/*.py into src/mapify_cli/templates/map/scripts/. This was pre-existing template drift on main (the templates were already behind .map/scripts/ before this branch) — running scripts/sync-templates.sh as part of this work brought them back in line. NOT a behavioral change here.

Copilot

Pull request overview

This PR updates the MAP framework templates to address planning failures around (1) user-invited clarification being ignored and (2) long-running/async work defaulting to unsafe in-process state, by adding prompt-trigger detection and expanding workflow documentation/skills.

Changes:

Add a UserPromptSubmit hook (detect-clarification-triggers.py) plus pytest coverage to inject clarification/durability guidance into Claude’s context.
Expand /map-plan and task-decomposer guidance to force deep interview on explicit “ask if unclear” language and to treat durability/async as cross-cutting.
Add several new skills (/map-check, /map-task, /map-tdd, /map-review, /map-resume, /map-debug, /map-fast) and update skill trigger rules and agent frontmatter (model/effort/permissions).

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/hooks/test_detect_clarification_triggers.py	New end-to-end tests for the `UserPromptSubmit` trigger hook behavior.
src/mapify_cli/templates/skills/skill-rules.json	Adds trigger rules for new MAP skills/commands.
src/mapify_cli/templates/skills/map-tdd/SKILL.md	Adds TDD workflow skill documentation.
src/mapify_cli/templates/skills/map-task/SKILL.md	Adds single-subtask execution skill documentation.
src/mapify_cli/templates/skills/map-review/SKILL.md	Adds structured interactive review workflow documentation.
src/mapify_cli/templates/skills/map-resume/SKILL.md	Adds workflow recovery skill documentation.
src/mapify_cli/templates/skills/map-fast/SKILL.md	Adds minimal workflow skill documentation.
src/mapify_cli/templates/skills/map-debug/SKILL.md	Adds debugging workflow skill documentation.
src/mapify_cli/templates/skills/map-check/SKILL.md	Adds quality gates + verification workflow skill documentation.
src/mapify_cli/templates/settings.json	Wires the new `UserPromptSubmit` hook into template settings.
src/mapify_cli/templates/map/scripts/map_utils.py	Removes unused constraint parsing helpers; keeps branch-name utility.
src/mapify_cli/templates/map/scripts/map_step_runner.py	Refactors formatting and changes plan artifact readiness tracking.
src/mapify_cli/templates/map/scripts/map_orchestrator.py	Refactors resume logic and state schema surface related to planning/bootstrap.
src/mapify_cli/templates/map/scripts/diagnostics.py	Minor formatting-only change.
src/mapify_cli/templates/hooks/detect-clarification-triggers.py	New hook implementation for clarification + durability trigger detection.
src/mapify_cli/templates/hooks/README.md	Adds hook conventions/exit-code documentation.
src/mapify_cli/templates/commands/map-plan.md	Updates planning heuristics: clarification override + durability interview + stricter Devil’s Advocate gating.
src/mapify_cli/templates/agents/task-decomposer.md	Updates decomposer config + adds durability audit and AskUserQuestion guidance.
src/mapify_cli/templates/agents/research-agent.md	Pins model/version metadata updates for research agent.
src/mapify_cli/templates/agents/monitor.md	Adds higher effort metadata for monitor.
src/mapify_cli/templates/agents/final-verifier.md	Pins verifier to opus + high effort metadata.
src/mapify_cli/templates/agents/evaluator.md	Adds higher effort metadata for evaluator.
docs/improvement-plan-2026-04-28.md	Documents the motivating incident and proposed framework improvements.
.claude/skills/skill-rules.json	Mirrors skill rule additions into `.claude/`.
.claude/skills/map-tdd/SKILL.md	Mirrors map-tdd skill into `.claude/`.
.claude/skills/map-task/SKILL.md	Mirrors map-task skill into `.claude/`.
.claude/skills/map-review/SKILL.md	Mirrors map-review skill into `.claude/`.
.claude/skills/map-resume/SKILL.md	Mirrors map-resume skill into `.claude/`.
.claude/skills/map-fast/SKILL.md	Mirrors map-fast skill into `.claude/`.
.claude/skills/map-debug/SKILL.md	Mirrors map-debug skill into `.claude/`.
.claude/skills/map-check/SKILL.md	Mirrors map-check skill into `.claude/`.
.claude/settings.json	Enables `UserPromptSubmit` hook in project settings.
.claude/rules/learned/architecture-patterns.md	Adds learned rule about durable state for long-running ops.
.claude/hooks/detect-clarification-triggers.py	Adds the hook to `.claude/hooks/`.
.claude/hooks/README.md	Adds hook conventions README to `.claude/hooks/`.
.claude/commands/map-plan.md	Mirrors `/map-plan` heuristic updates into `.claude/`.
.claude/agents/task-decomposer.md	Mirrors decomposer updates into `.claude/`.
.claude/agents/research-agent.md	Mirrors research-agent metadata updates into `.claude/`.
.claude/agents/monitor.md	Mirrors monitor metadata updates into `.claude/`.
.claude/agents/final-verifier.md	Mirrors final-verifier metadata updates into `.claude/`.
.claude/agents/evaluator.md	Mirrors evaluator metadata updates into `.claude/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    if step_state_path.exists():
        plan_artifacts.append(_artifact_ref(step_state_path, "step-state"))

-    if task_plan_path.exists() and blueprint_path.exists() and handoff_path.exists():
+    if task_plan_path.exists() and blueprint_path.exists() and step_state_path.exists():
        plan_status = "ready"
    elif plan_artifacts:


+    """Resume workflow from an existing /map-plan output, skipping init phases.
+
+    Detects task_plan_<branch>.md and step_state.json created by /map-plan.
+    Extracts subtask IDs from the plan, marks init phases as completed, and
+    starts execution from INIT_STATE (batch mode auto-set).


+    # Extract AAG contracts from step_state.json or blueprint.json if present
+    aag_contracts: dict[str, str] = {}
+    step_state_file = plan_dir / "step_state.json"
+    blueprint_file = plan_dir / "blueprint.json"
+    for source_file in [step_state_file, blueprint_file]:
+        if source_file.exists() and not aag_contracts:
+            try:
+                src_data = json.loads(source_file.read_text(encoding="utf-8"))
+                aag_contracts = src_data.get("aag_contracts", {})
+            except (json.JSONDecodeError, KeyError):


+# Two groups for scoring:
+#   "kind"     — operation kind / verb that is itself ambiguous about state
+#   "duration" — explicit duration markers ("5 min", "30 seconds")
+# A single duration alone does not fire; we want at least one "kind"
+# match OR multiple duration markers.


…igration The skills migration (commit 6baff28) shipped both .claude/commands/map-*.md AND .claude/skills/map-*/SKILL.md as a 'safety net' — but per Claude Code docs, skills win precedence when both exist. Keeping the commands as a fallback creates two sources of truth: edit a skill, the command file silently keeps the old version. Removes: - 10 .claude/commands/map-*.md (canonical source) - 10 src/mapify_cli/templates/commands/map-*.md (mirror) - tests/test_command_templates.py (replaced by the new TestCommandTemplateSynchronization class in test_template_sync.py which asserts the inverse: NO map-*.md may reappear in commands/) - scripts/commit-improvements-2026-04-28.sh (leftover one-time helper) Updates: - src/mapify_cli/delivery/file_copier.py: create_commands_dir README now lists all /map-* entries as skill-backed and explains that the commands/ dir exists for user-custom commands only. - scripts/sync-templates.sh: handle empty .claude/commands/ via shopt nullglob guard so the script doesn't fail when there are no .md files to copy. - tests/test_template_sync.py: TestCommandTemplateSynchronization rewritten — asserts (a) no map-*.md remains in either commands directory and (b) any non-map (user-custom) commands stay byte- identical between .claude/commands/ and templates/commands/. User-custom commands placed in .claude/commands/ continue to work unchanged. Only the framework's own /map-* entries moved to skills.

…ration - Replace should_resume_from_plan test calls with resume_from_plan (function was removed in prior commit) - Remove test_write_plan_handoff (write_plan_handoff deleted) - Update record_plan_artifacts test to require step_state.json for "ready" status instead of plan_handoff.json - Fix map-efficient SKILL.md to call resume_from_plan directly instead of the removed should_resume_from_plan CLI command - Update CLI check command to report agents only (commands count is 0 after skills migration) - Update 3 CLI tests for commands→skills migration expectations

Review findings from /map-review (Monitor+Predictor+Evaluator): - Remove write_plan_handoff call from map-plan SKILL.md (all 4 copies) and all plan_handoff.json references — function was deleted but skill still invoked it, breaking /map-plan Step 7 for every user - Fix doctor/check commands count mismatch after skills migration: drop commands count comparison, show skills info instead - Delete stale test_plan_handoff_has_required_fields integration test - Fix misleading comment in detect-clarification-triggers.py about duration-only triggering behavior - Remove unused count_command_templates import from test_mapify_cli.py

After skills migration, create_command_files no longer needs to generate inline map-*.md command files. The fallback triggered in CI where the empty templates/commands/ dir isn't packaged in the wheel, causing test_create_command_files to fail (expected 0 commands, got 3). Now delegates entirely to create_commands_dir (README only).

azalio added 5 commits April 28, 2026 15:42

Copilot AI review requested due to automatic review settings April 28, 2026 12:51

Copilot started reviewing on behalf of azalio April 28, 2026 12:51 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

azalio added 4 commits April 28, 2026 16:03

azalio merged commit 9e2e869 into main Apr 29, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/planning improvements 2026 04 28#103

Feature/planning improvements 2026 04 28#103
azalio merged 9 commits intomainfrom
feature/planning-improvements-2026-04-28

azalio commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

azalio commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants