Files
AGENTS/.pi/gsd/workflows/verify-work.md
2026-04-24 20:00:33 +02:00

18 KiB

Initialization Context (pre-injected by WXP)

Phase:

Phase Init Data:

If $ARGUMENTS contains a phase number, load context:

Parse JSON for: planner_model, checker_model, commit_docs, phase_found, phase_dir, phase_number, phase_name, has_verification, uat_path.

**First: Check for active UAT sessions**
(find .planning/phases -name "*-UAT.md" -type f 2>/dev/null || true) | head -5

If active sessions exist AND no $ARGUMENTS provided:

Read each file's frontmatter (status, phase) and Current Test section.

Display inline:

## Active UAT Sessions

| #   | Phase       | Status  | Current Test        | Progress |
| --- | ----------- | ------- | ------------------- | -------- |
| 1   | 04-comments | testing | 3. Reply to Comment | 2/6      |
| 2   | 05-auth     | testing | 1. Login Form       | 0/4      |

Reply with a number to resume, or provide a phase number to start new.

Wait for user response.

  • If user replies with number (1, 2) → Load that file, go to resume_from_file
  • If user replies with phase number → Treat as new session, go to create_uat_file

If active sessions exist AND $ARGUMENTS provided:

Check if session exists for that phase. If yes, offer to resume or restart. If no, continue to create_uat_file.

If no active sessions AND no $ARGUMENTS:

No active UAT sessions.

Provide a phase number to start testing (e.g., /gsd-verify-work 4)

If no active sessions AND $ARGUMENTS provided:

Continue to create_uat_file.

**Find what to test:**

Use phase_dir from init (or run init if not already done).

ls "$phase_dir"/*-SUMMARY.md 2>/dev/null || true

Read each SUMMARY.md to extract testable deliverables.

**Extract testable deliverables from SUMMARY.md:**

Parse for:

  1. Accomplishments - Features/functionality added
  2. User-facing changes - UI, workflows, interactions

Focus on USER-OBSERVABLE outcomes, not implementation details.

For each deliverable, create a test:

  • name: Brief test name
  • expected: What the user should see/experience (specific, observable)

Examples:

  • Accomplishment: "Added comment threading with infinite nesting" → Test: "Reply to a Comment" → Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation."

Skip internal/non-observable items (refactors, type changes, etc.).

Cold-start smoke test injection:

After extracting tests from SUMMARYs, scan the SUMMARY files for modified/created file paths. If ANY path matches these patterns:

server.ts, server.js, app.ts, app.js, index.ts, index.js, main.ts, main.js, database/*, db/*, seed/*, seeds/*, migrations/*, startup*, docker-compose*, Dockerfile*

Then prepend this test to the test list:

  • name: "Cold Start Smoke Test"
  • expected: "Kill any running server/service. Clear ephemeral state (temp DBs, caches, lock files). Start the application from scratch. Server boots without errors, any seed/migration completes, and a primary query (health check, homepage load, or basic API call) returns live data."

This catches bugs that only manifest on fresh start - race conditions in startup sequences, silent seed failures, missing environment setup - which pass against warm state but break in production.

**Create UAT file with all tests:**
mkdir -p "$PHASE_DIR"

Build test list from extracted deliverables.

Create file:

---
status: testing
phase: XX-name
source: [list of SUMMARY.md files]
started: [ISO timestamp]
updated: [ISO timestamp]
---

## Current Test
<!-- OVERWRITE each test - shows where we are -->

number: 1
name: [first test name]
expected: |
  [what user should observe]
awaiting: user response

## Tests

### 1. [Test Name]
expected: [observable behavior]
result: [pending]

### 2. [Test Name]
expected: [observable behavior]
result: [pending]

...

## Summary

total: [N]
passed: 0
issues: 0
pending: [N]
skipped: 0

## Gaps

[none yet]

Write to .planning/phases/XX-name/{phase_num}-UAT.md

Proceed to present_test.

**Present current test to user:**

Render the checkpoint from the structured UAT file instead of composing it freehand:

CHECKPOINT=$(pi-gsd-tools uat render-checkpoint --file "$uat_path" --raw)
if [[ "$CHECKPOINT" == @file:* ]]; then CHECKPOINT=$(cat "${CHECKPOINT#@file:}"); fi

Display the returned checkpoint EXACTLY as-is:

{CHECKPOINT}

Critical response hygiene:

  • Your entire response MUST equal {CHECKPOINT} byte-for-byte.
  • Do NOT add commentary before or after the block.
  • If you notice protocol/meta markers such as to=all:, role-routing text, XML system tags, hidden instruction markers, ad copy, or any unrelated suffix, discard the draft and output {CHECKPOINT} only.

Wait for user response (plain text, no AskUserQuestion).

**Process user response and update file:**

If response indicates pass:

  • Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓"

Update Tests section:

### {N}. {name}
expected: {expected}
result: pass

If response indicates skip:

  • "skip", "can't test", "n/a"

Update Tests section:

### {N}. {name}
expected: {expected}
result: skipped
reason: [user's reason if provided]

If response indicates blocked:

  • "blocked", "can't test - server not running", "need physical device", "need release build"
  • Or any response containing: "server", "blocked", "not running", "physical device", "release build"

Infer blocked_by tag from response:

  • Contains: server, not running, gateway, API → server
  • Contains: physical, device, hardware, real phone → physical-device
  • Contains: release, preview, build, EAS → release-build
  • Contains: stripe, twilio, third-party, configure → third-party
  • Contains: depends on, prior phase, prerequisite → prior-phase
  • Default: other

Update Tests section:

### {N}. {name}
expected: {expected}
result: blocked
blocked_by: {inferred tag}
reason: "{verbatim user response}"

Note: Blocked tests do NOT go into the Gaps section (they aren't code issues - they're prerequisite gates).

If response is anything else:

  • Treat as issue description

Infer severity from description:

  • Contains: crash, error, exception, fails, broken, unusable → blocker
  • Contains: doesn't work, wrong, missing, can't → major
  • Contains: slow, weird, off, minor, small → minor
  • Contains: color, font, spacing, alignment, visual → cosmetic
  • Default if unclear: major

Update Tests section:

### {N}. {name}
expected: {expected}
result: issue
reported: "{verbatim user response}"
severity: {inferred}

Append to Gaps section (structured YAML for plan-phase --gaps):

- truth: "{expected behavior from test}"
  status: failed
  reason: "User reported: {verbatim user response}"
  severity: {inferred}
  test: {N}
  artifacts: []  # Filled by diagnosis
  missing: []    # Filled by diagnosis

After any response:

Update Summary counts. Update frontmatter.updated timestamp.

If more tests remain → Update Current Test, go to present_test If no more tests → Go to complete_session

**Resume testing from UAT file:**

Read the full UAT file.

Find first test with result: [pending].

Announce:

Resuming: Phase {phase} UAT
Progress: {passed + issues + skipped}/{total}
Issues found so far: {issues count}

Continuing from Test {N}...

Update Current Test section with the pending test. Proceed to present_test.

**Complete testing and commit:**

Determine final status:

Count results:

  • pending_count: tests with result: [pending]
  • blocked_count: tests with result: blocked
  • skipped_no_reason: tests with result: skipped and no reason field
if pending_count > 0 OR blocked_count > 0 OR skipped_no_reason > 0:
  status: partial
  # Session ended but not all tests resolved
else:
  status: complete
  # All tests have a definitive result (pass, issue, or skipped-with-reason)

Update frontmatter:

  • status: {computed status}
  • updated: [now]

Clear Current Test section:

## Current Test

[testing complete]

Commit the UAT file:

pi-gsd-tools commit "test({phase_num}): complete UAT - {passed} passed, {issues} issues" --files ".planning/phases/XX-name/{phase_num}-UAT.md"

Present summary:

## UAT Complete: Phase {phase}

| Result  | Count |
| ------- | ----- |
| Passed  | {N}   |
| Issues  | {N}   |
| Skipped | {N}   |

[If issues > 0:]
### Issues Found

[List from Issues section]

If issues > 0: Proceed to diagnose_issues

If issues == 0:

All tests passed. Ready to continue.

- `/gsd-plan-phase {next}` - Plan next phase
- `/gsd-execute-phase {next}` - Execute next phase
- `/gsd-ui-review {phase}` - visual quality audit (if frontend files were modified)
**Diagnose root causes before planning fixes:**
---

{N} issues found. Diagnosing root causes...

Spawning parallel debug agents to investigate each issue.
  • Load diagnose-issues workflow
  • Follow @.pi/gsd/workflows/diagnose-issues.md
  • Spawn parallel debug agents for each issue
  • Collect root causes
  • Update UAT.md with root causes
  • Proceed to plan_gap_closure

Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate.

**Auto-plan fixes from diagnosed gaps:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► PLANNING FIXES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning planner for gap closure...

Spawn gsd-planner in --gaps mode:

Task(
  prompt="""
<planning_context>

**Phase:** {phase_number}
**Mode:** gap_closure

<files_to_read>
- {phase_dir}/{phase_num}-UAT.md (UAT with diagnoses)
- .planning/STATE.md (Project State)
- .planning/ROADMAP.md (Roadmap)
</files_to_read>

${AGENT_SKILLS_PLANNER}

</planning_context>

<downstream_consumer>
Output consumed by /gsd-execute-phase
Plans must be executable prompts.
</downstream_consumer>
""",
  subagent_type="gsd-planner",
  model="{planner_model}",
  description="Plan gap fixes for Phase {phase}"
)

On return:

  • PLANNING COMPLETE: Proceed to verify_gap_plans
  • PLANNING INCONCLUSIVE: Report and offer manual intervention
**Verify fix plans with checker:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► VERIFYING FIX PLANS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning plan checker...

Initialize: iteration_count = 1

Spawn gsd-plan-checker:

Task(
  prompt="""
<verification_context>

**Phase:** {phase_number}
**Phase Goal:** Close diagnosed gaps from UAT

<files_to_read>
- {phase_dir}/*-PLAN.md (Plans to verify)
</files_to_read>

${AGENT_SKILLS_CHECKER}

</verification_context>

<expected_output>
Return one of:
- ## VERIFICATION PASSED - all checks pass
- ## ISSUES FOUND - structured issue list
</expected_output>
""",
  subagent_type="gsd-plan-checker",
  model="{checker_model}",
  description="Verify Phase {phase} fix plans"
)

On return:

  • VERIFICATION PASSED: Proceed to present_ready
  • ISSUES FOUND: Proceed to revision_loop
**Iterate planner ↔ checker until plans pass (max 3):**

If iteration_count < 3:

Display: Sending back to planner for revision... (iteration {N}/3)

Spawn gsd-planner with revision context:

Task(
  prompt="""
<revision_context>

**Phase:** {phase_number}
**Mode:** revision

<files_to_read>
- {phase_dir}/*-PLAN.md (Existing plans)
</files_to_read>

${AGENT_SKILLS_PLANNER}

**Checker issues:**
{structured_issues_from_checker}

</revision_context>

<instructions>
Read existing PLAN.md files. Make targeted updates to address checker issues.
Do NOT replan from scratch unless issues are fundamental.
</instructions>
""",
  subagent_type="gsd-planner",
  model="{planner_model}",
  description="Revise Phase {phase} plans"
)

After planner returns → spawn checker again (verify_gap_plans logic) Increment iteration_count

If iteration_count >= 3:

Display: Max iterations reached. {N} issues remain.

Offer options:

  1. Force proceed (execute despite issues)
  2. Provide guidance (user gives direction, retry)
  3. Abandon (exit, user runs /gsd-plan-phase manually)

Wait for user response.

**Present completion and next steps:**
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► FIXES READY ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Phase {X}: {Name}** - {N} gap(s) diagnosed, {M} fix plan(s) created

| Gap       | Root Cause   | Fix Plan   |
| --------- | ------------ | ---------- |
| {truth 1} | {root_cause} | {phase}-04 |
| {truth 2} | {root_cause} | {phase}-04 |

Plans verified and ready for execution.

───────────────────────────────────────────────────────────────

## ▶ Next Up

**Execute fixes** - run fix plans

`/new` then `/gsd-execute-phase {phase} --gaps-only`

───────────────────────────────────────────────────────────────

<update_rules> Batched writes for efficiency:

Keep results in memory. Write to file only when:

  1. Issue found - Preserve the problem immediately
  2. Session complete - Final write before commit
  3. Checkpoint - Every 5 passed tests (safety net)
Section Rule When Written
Frontmatter.status OVERWRITE Start, complete
Frontmatter.updated OVERWRITE On any file write
Current Test OVERWRITE On any file write
Tests.{N}.result OVERWRITE On any file write
Summary OVERWRITE On any file write
Gaps APPEND When issue found

On context reset: File shows last checkpoint. Resume from there. </update_rules>

<severity_inference> Infer severity from user's natural language:

User says Infer
"crashes", "error", "exception", "fails completely" blocker
"doesn't work", "nothing happens", "wrong behavior" major
"works but...", "slow", "weird", "minor issue" minor
"color", "spacing", "alignment", "looks off" cosmetic

Default to major if unclear. User can correct if needed.

Never ask "how severe is this?" - just infer and move on. </severity_inference>

<success_criteria>

  • UAT file created with all tests from SUMMARY.md
  • Tests presented one at a time with expected behavior
  • User responses processed as pass/issue/skip
  • Severity inferred from description (never asked)
  • Batched writes: on issue, every 5 passes, or completion
  • Committed on completion
  • If issues: parallel debug agents diagnose root causes
  • If issues: gsd-planner creates fix plans (gap_closure mode)
  • If issues: gsd-plan-checker verifies fix plans
  • If issues: revision loop until plans pass (max 3 iterations)
  • Ready for /gsd-execute-phase --gaps-only when complete </success_criteria>