Skip to content

Feature/planning improvements 2026 04 28#103

Merged
azalio merged 9 commits intomainfrom
feature/planning-improvements-2026-04-28
Apr 29, 2026
Merged

Feature/planning improvements 2026 04 28#103
azalio merged 9 commits intomainfrom
feature/planning-improvements-2026-04-28

Conversation

@azalio
Copy link
Copy Markdown
Owner

@azalio azalio commented Apr 28, 2026

No description provided.

azalio added 5 commits April 28, 2026 15:42
Diagnoses two regressions found in user feedback comparing MAP to
Codex/Superpowers on a long-running tool integration task:

- /map-plan and task-decomposer have no durability/persistence checks
  for async/long-running operations — Codex caught in-memory state for
  a 5-minute call, MAP did not.
- The decomposer ignored an explicit user instruction to ask
  clarifying questions, because heuristic skip conditions in
  /map-plan Step 1 silently override explicit user intent.

Plan ranks fixes P0..P3 with file paths, anchors, and acceptance
criteria. Reads Claude Code docs to harvest underused features
(UserPromptSubmit hook, model/effort frontmatter, skills auto-trigger).
P0.1 — Override (always wins) at top of Step 1: explicit clarification
  language (English+Russian) forces Step 2 interview regardless of any
  heuristic skip signal. Lifts the 'silently ignored ask if unclear'
  failure structurally.

P0.2 — Step 2b skip rewritten: drop length/subtask-count gates, keep
  only 'no cross-cutting concerns'. Cross-cutting now explicitly
  includes durability, async >30s, polling, webhooks, queueing.
  Always-run list extended with regex on async/durability terms.

P0.3 — 8th interview dimension 'Durability & State Lifecycle': for any
  operation >5s, where does state live, does it survive restart,
  what's the resume identifier. In-process memory is not a valid
  default answer.

P0.5 — New learned rule in architecture-patterns.md:
  'Long-Running Operations Need Durable State by Default'.
  Codifies the durability default with WRONG/CORRECT examples.
…gers

New non-blocking hook fires on every user prompt and emits
hookSpecificOutput.additionalContext when:

- Explicit clarification-invitation language is present (bilingual:
  'ask if unclear', 'do not assume', 'если что-то непонятно',
  'спрашивай', etc.)
- Async/long-running language is present (any kind word: async, webhook,
  polling, асинхронн, в фоне, etc., OR a significant duration: minutes,
  hours, or >=30 seconds; bilingual)

Both paths are non-blocking (exit 0). Per Claude Code docs, only exit
code 2 blocks — informational hooks must use additionalContext, not
exit codes.

Acts as a structural backstop to the in-prompt P0.1 rule: even if a
future prompt edit weakens the in-prompt override, the hook will keep
nudging the planner toward Step 2 + Devil's Advocate + Durability Audit.

Includes 28 unit tests covering original failure case (English +
Russian), positive triggers, negative tests (no false positives on
small numbers, unrelated prompts), and robustness (malformed JSON,
empty prompt, alternate field names).
task-decomposer.md gets three additions:
  * P0.4 — 'Durability Audit' sub-checklist that fires when any
    subtask description matches async/long-running language; checks
    every state element's storage, durability across restarts, and
    resume identifier semantics.
  * P1.2 — 'Mid-Decomposition Clarification (AskUserQuestion)' — a
    third path between 'full plan' and 'refuse with questions'.
    Authorizes ONE targeted AskUserQuestion call for architecturally
    load-bearing questions only (durability, fan-out, sync vs async),
    explicitly disallowed for naming/styling. Notes foreground vs
    background subagent semantics.
  * P2.1 + P2.2 — frontmatter: model: opus, effort: high,
    permissionMode: plan. Decomposition is the load-bearing decision
    in MAP; the user feedback that triggered this work observed that
    Codex on medium beat Claude on default-sonnet on a planning task.

Other agents:
  * final-verifier: model: opus, effort: high (last gate before merge)
  * monitor:    effort: high (kept on sonnet — adversarial review
                quality scales with effort more than model strength)
  * evaluator:  effort: high (weighted scoring needs deliberation
                budget)
  * research-agent: model: haiku (read-mostly, frees Opus/Sonnet
                budget for load-bearing decision agents)

P2.3 — Audit of existing hook exit codes: every existing hook in
  .claude/hooks/ already uses sys.exit(0) and delegates blocking to
  stdout JSON (permissionDecision: deny). No fix needed; documented
  the rule of the road in a new .claude/hooks/README.md so future
  hook authors don't accidentally use exit 1 to block (silently
  fails per Claude Code docs).
P3.1 — Migrate 10 slash commands to Skills format. Per Claude Code
docs, /commands/<name>.md and /skills/<name>/SKILL.md are functionally
merged: both register the same /<name> slash command. Skills win
precedence when both exist. The migration buys progressive disclosure:
skill descriptions stay in context, body loads on demand.

Each new SKILL.md has skill-compliant frontmatter (name, description
≤250 chars with Use when/Do NOT use phrases, disable-model-invocation,
argument-hint), the original command body verbatim, plus required
Examples and Troubleshooting sections.

Originals at .claude/commands/map-*.md are preserved as fallback —
removing them is deferred to a separate PR after we observe the
skills version in use.

Tests: tests/test_skills.py runs 18 strict checks per skill; all
green. tests/test_command_templates.py also green.

P3.2 — Verified the Claude Code docs claim that subagents cannot
spawn other subagents. actor.md contains a contradictory pair of
sections (one tells actor to call research-agent via Task(); the other
says research is pre-run by orchestrator). The contradiction is
documented in the improvement plan; cleanup deferred to a separate
PR to keep this batch focused.

chore: sync re-mirrored .map/scripts/*.py into
src/mapify_cli/templates/map/scripts/. This was pre-existing template
drift on main (the templates were already behind .map/scripts/ before
this branch) — running scripts/sync-templates.sh as part of this work
brought them back in line. NOT a behavioral change here.
Copilot AI review requested due to automatic review settings April 28, 2026 12:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MAP framework templates to address planning failures around (1) user-invited clarification being ignored and (2) long-running/async work defaulting to unsafe in-process state, by adding prompt-trigger detection and expanding workflow documentation/skills.

Changes:

  • Add a UserPromptSubmit hook (detect-clarification-triggers.py) plus pytest coverage to inject clarification/durability guidance into Claude’s context.
  • Expand /map-plan and task-decomposer guidance to force deep interview on explicit “ask if unclear” language and to treat durability/async as cross-cutting.
  • Add several new skills (/map-check, /map-task, /map-tdd, /map-review, /map-resume, /map-debug, /map-fast) and update skill trigger rules and agent frontmatter (model/effort/permissions).

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/hooks/test_detect_clarification_triggers.py New end-to-end tests for the UserPromptSubmit trigger hook behavior.
src/mapify_cli/templates/skills/skill-rules.json Adds trigger rules for new MAP skills/commands.
src/mapify_cli/templates/skills/map-tdd/SKILL.md Adds TDD workflow skill documentation.
src/mapify_cli/templates/skills/map-task/SKILL.md Adds single-subtask execution skill documentation.
src/mapify_cli/templates/skills/map-review/SKILL.md Adds structured interactive review workflow documentation.
src/mapify_cli/templates/skills/map-resume/SKILL.md Adds workflow recovery skill documentation.
src/mapify_cli/templates/skills/map-fast/SKILL.md Adds minimal workflow skill documentation.
src/mapify_cli/templates/skills/map-debug/SKILL.md Adds debugging workflow skill documentation.
src/mapify_cli/templates/skills/map-check/SKILL.md Adds quality gates + verification workflow skill documentation.
src/mapify_cli/templates/settings.json Wires the new UserPromptSubmit hook into template settings.
src/mapify_cli/templates/map/scripts/map_utils.py Removes unused constraint parsing helpers; keeps branch-name utility.
src/mapify_cli/templates/map/scripts/map_step_runner.py Refactors formatting and changes plan artifact readiness tracking.
src/mapify_cli/templates/map/scripts/map_orchestrator.py Refactors resume logic and state schema surface related to planning/bootstrap.
src/mapify_cli/templates/map/scripts/diagnostics.py Minor formatting-only change.
src/mapify_cli/templates/hooks/detect-clarification-triggers.py New hook implementation for clarification + durability trigger detection.
src/mapify_cli/templates/hooks/README.md Adds hook conventions/exit-code documentation.
src/mapify_cli/templates/commands/map-plan.md Updates planning heuristics: clarification override + durability interview + stricter Devil’s Advocate gating.
src/mapify_cli/templates/agents/task-decomposer.md Updates decomposer config + adds durability audit and AskUserQuestion guidance.
src/mapify_cli/templates/agents/research-agent.md Pins model/version metadata updates for research agent.
src/mapify_cli/templates/agents/monitor.md Adds higher effort metadata for monitor.
src/mapify_cli/templates/agents/final-verifier.md Pins verifier to opus + high effort metadata.
src/mapify_cli/templates/agents/evaluator.md Adds higher effort metadata for evaluator.
docs/improvement-plan-2026-04-28.md Documents the motivating incident and proposed framework improvements.
.claude/skills/skill-rules.json Mirrors skill rule additions into .claude/.
.claude/skills/map-tdd/SKILL.md Mirrors map-tdd skill into .claude/.
.claude/skills/map-task/SKILL.md Mirrors map-task skill into .claude/.
.claude/skills/map-review/SKILL.md Mirrors map-review skill into .claude/.
.claude/skills/map-resume/SKILL.md Mirrors map-resume skill into .claude/.
.claude/skills/map-fast/SKILL.md Mirrors map-fast skill into .claude/.
.claude/skills/map-debug/SKILL.md Mirrors map-debug skill into .claude/.
.claude/skills/map-check/SKILL.md Mirrors map-check skill into .claude/.
.claude/settings.json Enables UserPromptSubmit hook in project settings.
.claude/rules/learned/architecture-patterns.md Adds learned rule about durable state for long-running ops.
.claude/hooks/detect-clarification-triggers.py Adds the hook to .claude/hooks/.
.claude/hooks/README.md Adds hook conventions README to .claude/hooks/.
.claude/commands/map-plan.md Mirrors /map-plan heuristic updates into .claude/.
.claude/agents/task-decomposer.md Mirrors decomposer updates into .claude/.
.claude/agents/research-agent.md Mirrors research-agent metadata updates into .claude/.
.claude/agents/monitor.md Mirrors monitor metadata updates into .claude/.
.claude/agents/final-verifier.md Mirrors final-verifier metadata updates into .claude/.
.claude/agents/evaluator.md Mirrors evaluator metadata updates into .claude/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1010 to 1015
if step_state_path.exists():
plan_artifacts.append(_artifact_ref(step_state_path, "step-state"))

if task_plan_path.exists() and blueprint_path.exists() and handoff_path.exists():
if task_plan_path.exists() and blueprint_path.exists() and step_state_path.exists():
plan_status = "ready"
elif plan_artifacts:
Comment on lines +1533 to +1537
"""Resume workflow from an existing /map-plan output, skipping init phases.

Detects task_plan_<branch>.md and step_state.json created by /map-plan.
Extracts subtask IDs from the plan, marks init phases as completed, and
starts execution from INIT_STATE (batch mode auto-set).
Comment on lines +1567 to +1576
# Extract AAG contracts from step_state.json or blueprint.json if present
aag_contracts: dict[str, str] = {}
step_state_file = plan_dir / "step_state.json"
blueprint_file = plan_dir / "blueprint.json"
for source_file in [step_state_file, blueprint_file]:
if source_file.exists() and not aag_contracts:
try:
src_data = json.loads(source_file.read_text(encoding="utf-8"))
aag_contracts = src_data.get("aag_contracts", {})
except (json.JSONDecodeError, KeyError):
Comment on lines +62 to +66
# Two groups for scoring:
# "kind" — operation kind / verb that is itself ambiguous about state
# "duration" — explicit duration markers ("5 min", "30 seconds")
# A single duration alone does not fire; we want at least one "kind"
# match OR multiple duration markers.
azalio added 4 commits April 28, 2026 16:03
…igration

The skills migration (commit 6baff28) shipped both .claude/commands/map-*.md
AND .claude/skills/map-*/SKILL.md as a 'safety net' — but per Claude Code
docs, skills win precedence when both exist. Keeping the commands as a
fallback creates two sources of truth: edit a skill, the command file
silently keeps the old version.

Removes:
- 10 .claude/commands/map-*.md (canonical source)
- 10 src/mapify_cli/templates/commands/map-*.md (mirror)
- tests/test_command_templates.py (replaced by the new
  TestCommandTemplateSynchronization class in test_template_sync.py
  which asserts the inverse: NO map-*.md may reappear in commands/)
- scripts/commit-improvements-2026-04-28.sh (leftover one-time helper)

Updates:
- src/mapify_cli/delivery/file_copier.py: create_commands_dir README
  now lists all /map-* entries as skill-backed and explains that the
  commands/ dir exists for user-custom commands only.
- scripts/sync-templates.sh: handle empty .claude/commands/ via
  shopt nullglob guard so the script doesn't fail when there are
  no .md files to copy.
- tests/test_template_sync.py: TestCommandTemplateSynchronization
  rewritten — asserts (a) no map-*.md remains in either commands
  directory and (b) any non-map (user-custom) commands stay byte-
  identical between .claude/commands/ and templates/commands/.

User-custom commands placed in .claude/commands/ continue to work
unchanged. Only the framework's own /map-* entries moved to skills.
…ration

- Replace should_resume_from_plan test calls with resume_from_plan
  (function was removed in prior commit)
- Remove test_write_plan_handoff (write_plan_handoff deleted)
- Update record_plan_artifacts test to require step_state.json for
  "ready" status instead of plan_handoff.json
- Fix map-efficient SKILL.md to call resume_from_plan directly instead
  of the removed should_resume_from_plan CLI command
- Update CLI check command to report agents only (commands count is 0
  after skills migration)
- Update 3 CLI tests for commands→skills migration expectations
Review findings from /map-review (Monitor+Predictor+Evaluator):

- Remove write_plan_handoff call from map-plan SKILL.md (all 4 copies)
  and all plan_handoff.json references — function was deleted but skill
  still invoked it, breaking /map-plan Step 7 for every user
- Fix doctor/check commands count mismatch after skills migration:
  drop commands count comparison, show skills info instead
- Delete stale test_plan_handoff_has_required_fields integration test
- Fix misleading comment in detect-clarification-triggers.py about
  duration-only triggering behavior
- Remove unused count_command_templates import from test_mapify_cli.py
After skills migration, create_command_files no longer needs to generate
inline map-*.md command files. The fallback triggered in CI where the
empty templates/commands/ dir isn't packaged in the wheel, causing
test_create_command_files to fail (expected 0 commands, got 3).

Now delegates entirely to create_commands_dir (README only).
@azalio azalio merged commit 9e2e869 into main Apr 29, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants