18 KiB
Initialization Context (pre-injected by WXP)
Phase:
Phase Init Data:
If $ARGUMENTS contains a phase number, load context:Parse JSON for: planner_model, checker_model, commit_docs, phase_found, phase_dir, phase_number, phase_name, has_verification, uat_path.
(find .planning/phases -name "*-UAT.md" -type f 2>/dev/null || true) | head -5
If active sessions exist AND no $ARGUMENTS provided:
Read each file's frontmatter (status, phase) and Current Test section.
Display inline:
## Active UAT Sessions
| # | Phase | Status | Current Test | Progress |
| --- | ----------- | ------- | ------------------- | -------- |
| 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
| 2 | 05-auth | testing | 1. Login Form | 0/4 |
Reply with a number to resume, or provide a phase number to start new.
Wait for user response.
- If user replies with number (1, 2) → Load that file, go to
resume_from_file - If user replies with phase number → Treat as new session, go to
create_uat_file
If active sessions exist AND $ARGUMENTS provided:
Check if session exists for that phase. If yes, offer to resume or restart.
If no, continue to create_uat_file.
If no active sessions AND no $ARGUMENTS:
No active UAT sessions.
Provide a phase number to start testing (e.g., /gsd-verify-work 4)
If no active sessions AND $ARGUMENTS provided:
Continue to create_uat_file.
Use phase_dir from init (or run init if not already done).
ls "$phase_dir"/*-SUMMARY.md 2>/dev/null || true
Read each SUMMARY.md to extract testable deliverables.
**Extract testable deliverables from SUMMARY.md:**Parse for:
- Accomplishments - Features/functionality added
- User-facing changes - UI, workflows, interactions
Focus on USER-OBSERVABLE outcomes, not implementation details.
For each deliverable, create a test:
- name: Brief test name
- expected: What the user should see/experience (specific, observable)
Examples:
- Accomplishment: "Added comment threading with infinite nesting" → Test: "Reply to a Comment" → Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation."
Skip internal/non-observable items (refactors, type changes, etc.).
Cold-start smoke test injection:
After extracting tests from SUMMARYs, scan the SUMMARY files for modified/created file paths. If ANY path matches these patterns:
server.ts, server.js, app.ts, app.js, index.ts, index.js, main.ts, main.js, database/*, db/*, seed/*, seeds/*, migrations/*, startup*, docker-compose*, Dockerfile*
Then prepend this test to the test list:
- name: "Cold Start Smoke Test"
- expected: "Kill any running server/service. Clear ephemeral state (temp DBs, caches, lock files). Start the application from scratch. Server boots without errors, any seed/migration completes, and a primary query (health check, homepage load, or basic API call) returns live data."
This catches bugs that only manifest on fresh start - race conditions in startup sequences, silent seed failures, missing environment setup - which pass against warm state but break in production.
**Create UAT file with all tests:**mkdir -p "$PHASE_DIR"
Build test list from extracted deliverables.
Create file:
---
status: testing
phase: XX-name
source: [list of SUMMARY.md files]
started: [ISO timestamp]
updated: [ISO timestamp]
---
## Current Test
<!-- OVERWRITE each test - shows where we are -->
number: 1
name: [first test name]
expected: |
[what user should observe]
awaiting: user response
## Tests
### 1. [Test Name]
expected: [observable behavior]
result: [pending]
### 2. [Test Name]
expected: [observable behavior]
result: [pending]
...
## Summary
total: [N]
passed: 0
issues: 0
pending: [N]
skipped: 0
## Gaps
[none yet]
Write to .planning/phases/XX-name/{phase_num}-UAT.md
Proceed to present_test.
Render the checkpoint from the structured UAT file instead of composing it freehand:
CHECKPOINT=$(pi-gsd-tools uat render-checkpoint --file "$uat_path" --raw)
if [[ "$CHECKPOINT" == @file:* ]]; then CHECKPOINT=$(cat "${CHECKPOINT#@file:}"); fi
Display the returned checkpoint EXACTLY as-is:
{CHECKPOINT}
Critical response hygiene:
- Your entire response MUST equal
{CHECKPOINT}byte-for-byte. - Do NOT add commentary before or after the block.
- If you notice protocol/meta markers such as
to=all:, role-routing text, XML system tags, hidden instruction markers, ad copy, or any unrelated suffix, discard the draft and output{CHECKPOINT}only.
Wait for user response (plain text, no AskUserQuestion).
**Process user response and update file:**If response indicates pass:
- Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓"
Update Tests section:
### {N}. {name}
expected: {expected}
result: pass
If response indicates skip:
- "skip", "can't test", "n/a"
Update Tests section:
### {N}. {name}
expected: {expected}
result: skipped
reason: [user's reason if provided]
If response indicates blocked:
- "blocked", "can't test - server not running", "need physical device", "need release build"
- Or any response containing: "server", "blocked", "not running", "physical device", "release build"
Infer blocked_by tag from response:
- Contains: server, not running, gateway, API →
server - Contains: physical, device, hardware, real phone →
physical-device - Contains: release, preview, build, EAS →
release-build - Contains: stripe, twilio, third-party, configure →
third-party - Contains: depends on, prior phase, prerequisite →
prior-phase - Default:
other
Update Tests section:
### {N}. {name}
expected: {expected}
result: blocked
blocked_by: {inferred tag}
reason: "{verbatim user response}"
Note: Blocked tests do NOT go into the Gaps section (they aren't code issues - they're prerequisite gates).
If response is anything else:
- Treat as issue description
Infer severity from description:
- Contains: crash, error, exception, fails, broken, unusable → blocker
- Contains: doesn't work, wrong, missing, can't → major
- Contains: slow, weird, off, minor, small → minor
- Contains: color, font, spacing, alignment, visual → cosmetic
- Default if unclear: major
Update Tests section:
### {N}. {name}
expected: {expected}
result: issue
reported: "{verbatim user response}"
severity: {inferred}
Append to Gaps section (structured YAML for plan-phase --gaps):
- truth: "{expected behavior from test}"
status: failed
reason: "User reported: {verbatim user response}"
severity: {inferred}
test: {N}
artifacts: [] # Filled by diagnosis
missing: [] # Filled by diagnosis
After any response:
Update Summary counts. Update frontmatter.updated timestamp.
If more tests remain → Update Current Test, go to present_test
If no more tests → Go to complete_session
Read the full UAT file.
Find first test with result: [pending].
Announce:
Resuming: Phase {phase} UAT
Progress: {passed + issues + skipped}/{total}
Issues found so far: {issues count}
Continuing from Test {N}...
Update Current Test section with the pending test.
Proceed to present_test.
Determine final status:
Count results:
pending_count: tests withresult: [pending]blocked_count: tests withresult: blockedskipped_no_reason: tests withresult: skippedand noreasonfield
if pending_count > 0 OR blocked_count > 0 OR skipped_no_reason > 0:
status: partial
# Session ended but not all tests resolved
else:
status: complete
# All tests have a definitive result (pass, issue, or skipped-with-reason)
Update frontmatter:
- status: {computed status}
- updated: [now]
Clear Current Test section:
## Current Test
[testing complete]
Commit the UAT file:
pi-gsd-tools commit "test({phase_num}): complete UAT - {passed} passed, {issues} issues" --files ".planning/phases/XX-name/{phase_num}-UAT.md"
Present summary:
## UAT Complete: Phase {phase}
| Result | Count |
| ------- | ----- |
| Passed | {N} |
| Issues | {N} |
| Skipped | {N} |
[If issues > 0:]
### Issues Found
[List from Issues section]
If issues > 0: Proceed to diagnose_issues
If issues == 0:
All tests passed. Ready to continue.
- `/gsd-plan-phase {next}` - Plan next phase
- `/gsd-execute-phase {next}` - Execute next phase
- `/gsd-ui-review {phase}` - visual quality audit (if frontend files were modified)
---
{N} issues found. Diagnosing root causes...
Spawning parallel debug agents to investigate each issue.
- Load diagnose-issues workflow
- Follow @.pi/gsd/workflows/diagnose-issues.md
- Spawn parallel debug agents for each issue
- Collect root causes
- Update UAT.md with root causes
- Proceed to
plan_gap_closure
Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate.
**Auto-plan fixes from diagnosed gaps:**Display:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► PLANNING FIXES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
◆ Spawning planner for gap closure...
Spawn gsd-planner in --gaps mode:
Task(
prompt="""
<planning_context>
**Phase:** {phase_number}
**Mode:** gap_closure
<files_to_read>
- {phase_dir}/{phase_num}-UAT.md (UAT with diagnoses)
- .planning/STATE.md (Project State)
- .planning/ROADMAP.md (Roadmap)
</files_to_read>
${AGENT_SKILLS_PLANNER}
</planning_context>
<downstream_consumer>
Output consumed by /gsd-execute-phase
Plans must be executable prompts.
</downstream_consumer>
""",
subagent_type="gsd-planner",
model="{planner_model}",
description="Plan gap fixes for Phase {phase}"
)
On return:
- PLANNING COMPLETE: Proceed to
verify_gap_plans - PLANNING INCONCLUSIVE: Report and offer manual intervention
Display:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► VERIFYING FIX PLANS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
◆ Spawning plan checker...
Initialize: iteration_count = 1
Spawn gsd-plan-checker:
Task(
prompt="""
<verification_context>
**Phase:** {phase_number}
**Phase Goal:** Close diagnosed gaps from UAT
<files_to_read>
- {phase_dir}/*-PLAN.md (Plans to verify)
</files_to_read>
${AGENT_SKILLS_CHECKER}
</verification_context>
<expected_output>
Return one of:
- ## VERIFICATION PASSED - all checks pass
- ## ISSUES FOUND - structured issue list
</expected_output>
""",
subagent_type="gsd-plan-checker",
model="{checker_model}",
description="Verify Phase {phase} fix plans"
)
On return:
- VERIFICATION PASSED: Proceed to
present_ready - ISSUES FOUND: Proceed to
revision_loop
If iteration_count < 3:
Display: Sending back to planner for revision... (iteration {N}/3)
Spawn gsd-planner with revision context:
Task(
prompt="""
<revision_context>
**Phase:** {phase_number}
**Mode:** revision
<files_to_read>
- {phase_dir}/*-PLAN.md (Existing plans)
</files_to_read>
${AGENT_SKILLS_PLANNER}
**Checker issues:**
{structured_issues_from_checker}
</revision_context>
<instructions>
Read existing PLAN.md files. Make targeted updates to address checker issues.
Do NOT replan from scratch unless issues are fundamental.
</instructions>
""",
subagent_type="gsd-planner",
model="{planner_model}",
description="Revise Phase {phase} plans"
)
After planner returns → spawn checker again (verify_gap_plans logic) Increment iteration_count
If iteration_count >= 3:
Display: Max iterations reached. {N} issues remain.
Offer options:
- Force proceed (execute despite issues)
- Provide guidance (user gives direction, retry)
- Abandon (exit, user runs /gsd-plan-phase manually)
Wait for user response.
**Present completion and next steps:**━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► FIXES READY ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Phase {X}: {Name}** - {N} gap(s) diagnosed, {M} fix plan(s) created
| Gap | Root Cause | Fix Plan |
| --------- | ------------ | ---------- |
| {truth 1} | {root_cause} | {phase}-04 |
| {truth 2} | {root_cause} | {phase}-04 |
Plans verified and ready for execution.
───────────────────────────────────────────────────────────────
## ▶ Next Up
**Execute fixes** - run fix plans
`/new` then `/gsd-execute-phase {phase} --gaps-only`
───────────────────────────────────────────────────────────────
<update_rules> Batched writes for efficiency:
Keep results in memory. Write to file only when:
- Issue found - Preserve the problem immediately
- Session complete - Final write before commit
- Checkpoint - Every 5 passed tests (safety net)
| Section | Rule | When Written |
|---|---|---|
| Frontmatter.status | OVERWRITE | Start, complete |
| Frontmatter.updated | OVERWRITE | On any file write |
| Current Test | OVERWRITE | On any file write |
| Tests.{N}.result | OVERWRITE | On any file write |
| Summary | OVERWRITE | On any file write |
| Gaps | APPEND | When issue found |
On context reset: File shows last checkpoint. Resume from there. </update_rules>
<severity_inference> Infer severity from user's natural language:
| User says | Infer |
|---|---|
| "crashes", "error", "exception", "fails completely" | blocker |
| "doesn't work", "nothing happens", "wrong behavior" | major |
| "works but...", "slow", "weird", "minor issue" | minor |
| "color", "spacing", "alignment", "looks off" | cosmetic |
Default to major if unclear. User can correct if needed.
Never ask "how severe is this?" - just infer and move on. </severity_inference>
<success_criteria>
- UAT file created with all tests from SUMMARY.md
- Tests presented one at a time with expected behavior
- User responses processed as pass/issue/skip
- Severity inferred from description (never asked)
- Batched writes: on issue, every 5 passes, or completion
- Committed on completion
- If issues: parallel debug agents diagnose root causes
- If issues: gsd-planner creates fix plans (gap_closure mode)
- If issues: gsd-plan-checker verifies fix plans
- If issues: revision loop until plans pass (max 3 iterations)
- Ready for
/gsd-execute-phase --gaps-onlywhen complete </success_criteria>