Files
AGENTS/.pi/gsd/workflows/verify-work.md
2026-04-24 20:00:33 +02:00

694 lines
18 KiB
Markdown

<gsd-version v="1.12.4" />
<gsd-arguments>
<settings>
<keep-extra-args />
</settings>
<arg name="phase" type="number" optional />
<arg name="plan" type="flag" flag="--plan" optional />
</gsd-arguments>
<gsd-execute>
<if>
<condition>
<equals>
<left name="phase" />
<right type="string" value="" />
</equals>
</condition>
<else>
<shell command="pi-gsd-tools">
<args>
<arg string="init" />
<arg string="verify-work" />
<arg name="phase" wrap='"' />
</args>
<outs>
<out type="string" name="init" />
</outs>
</shell>
<if>
<condition>
<starts-with>
<left name="init" />
<right type="string" value="@file:" />
</starts-with>
</condition>
<then>
<string-op op="split">
<args>
<arg name="init" />
<arg type="string" value="@file:" />
</args>
<outs>
<out type="string" name="init-file" />
</outs>
</string-op>
<shell command="cat">
<args>
<arg name="init-file" wrap='"' />
</args>
<outs>
<out type="string" name="init" />
</outs>
</shell>
</then>
</if>
<shell command="pi-gsd-tools">
<args>
<arg string="agent-skills" />
<arg string="gsd-planner" />
</args>
<outs>
<suppress-errors />
<out type="string" name="agent-skills-planner" />
</outs>
</shell>
<shell command="pi-gsd-tools">
<args>
<arg string="agent-skills" />
<arg string="gsd-plan-checker" />
</args>
<outs>
<suppress-errors />
<out type="string" name="agent-skills-checker" />
</outs>
</shell>
</else>
</if>
</gsd-execute>
## Initialization Context (pre-injected by WXP)
**Phase:** <gsd-paste name="phase" />
**Phase Init Data:**
<gsd-paste name="init" />
<process>
<step name="initialize" priority="first">
If $ARGUMENTS contains a phase number, load context:
<!-- Context pre-injected above via WXP - variables available via <gsd-paste name="..."> -->
Parse JSON for: `planner_model`, `checker_model`, `commit_docs`, `phase_found`, `phase_dir`, `phase_number`, `phase_name`, `has_verification`, `uat_path`.
</step>
<step name="check_active_session">
**First: Check for active UAT sessions**
```bash
(find .planning/phases -name "*-UAT.md" -type f 2>/dev/null || true) | head -5
```
**If active sessions exist AND no $ARGUMENTS provided:**
Read each file's frontmatter (status, phase) and Current Test section.
Display inline:
```
## Active UAT Sessions
| # | Phase | Status | Current Test | Progress |
| --- | ----------- | ------- | ------------------- | -------- |
| 1 | 04-comments | testing | 3. Reply to Comment | 2/6 |
| 2 | 05-auth | testing | 1. Login Form | 0/4 |
Reply with a number to resume, or provide a phase number to start new.
```
Wait for user response.
- If user replies with number (1, 2) → Load that file, go to `resume_from_file`
- If user replies with phase number → Treat as new session, go to `create_uat_file`
**If active sessions exist AND $ARGUMENTS provided:**
Check if session exists for that phase. If yes, offer to resume or restart.
If no, continue to `create_uat_file`.
**If no active sessions AND no $ARGUMENTS:**
```
No active UAT sessions.
Provide a phase number to start testing (e.g., /gsd-verify-work 4)
```
**If no active sessions AND $ARGUMENTS provided:**
Continue to `create_uat_file`.
</step>
<step name="find_summaries">
**Find what to test:**
Use `phase_dir` from init (or run init if not already done).
```bash
ls "$phase_dir"/*-SUMMARY.md 2>/dev/null || true
```
Read each SUMMARY.md to extract testable deliverables.
</step>
<step name="extract_tests">
**Extract testable deliverables from SUMMARY.md:**
Parse for:
1. **Accomplishments** - Features/functionality added
2. **User-facing changes** - UI, workflows, interactions
Focus on USER-OBSERVABLE outcomes, not implementation details.
For each deliverable, create a test:
- name: Brief test name
- expected: What the user should see/experience (specific, observable)
Examples:
- Accomplishment: "Added comment threading with infinite nesting"
→ Test: "Reply to a Comment"
→ Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation."
Skip internal/non-observable items (refactors, type changes, etc.).
**Cold-start smoke test injection:**
After extracting tests from SUMMARYs, scan the SUMMARY files for modified/created file paths. If ANY path matches these patterns:
`server.ts`, `server.js`, `app.ts`, `app.js`, `index.ts`, `index.js`, `main.ts`, `main.js`, `database/*`, `db/*`, `seed/*`, `seeds/*`, `migrations/*`, `startup*`, `docker-compose*`, `Dockerfile*`
Then **prepend** this test to the test list:
- name: "Cold Start Smoke Test"
- expected: "Kill any running server/service. Clear ephemeral state (temp DBs, caches, lock files). Start the application from scratch. Server boots without errors, any seed/migration completes, and a primary query (health check, homepage load, or basic API call) returns live data."
This catches bugs that only manifest on fresh start - race conditions in startup sequences, silent seed failures, missing environment setup - which pass against warm state but break in production.
</step>
<step name="create_uat_file">
**Create UAT file with all tests:**
```bash
mkdir -p "$PHASE_DIR"
```
Build test list from extracted deliverables.
Create file:
```markdown
---
status: testing
phase: XX-name
source: [list of SUMMARY.md files]
started: [ISO timestamp]
updated: [ISO timestamp]
---
## Current Test
<!-- OVERWRITE each test - shows where we are -->
number: 1
name: [first test name]
expected: |
[what user should observe]
awaiting: user response
## Tests
### 1. [Test Name]
expected: [observable behavior]
result: [pending]
### 2. [Test Name]
expected: [observable behavior]
result: [pending]
...
## Summary
total: [N]
passed: 0
issues: 0
pending: [N]
skipped: 0
## Gaps
[none yet]
```
Write to `.planning/phases/XX-name/{phase_num}-UAT.md`
Proceed to `present_test`.
</step>
<step name="present_test">
**Present current test to user:**
Render the checkpoint from the structured UAT file instead of composing it freehand:
```bash
CHECKPOINT=$(pi-gsd-tools uat render-checkpoint --file "$uat_path" --raw)
if [[ "$CHECKPOINT" == @file:* ]]; then CHECKPOINT=$(cat "${CHECKPOINT#@file:}"); fi
```
Display the returned checkpoint EXACTLY as-is:
```
{CHECKPOINT}
```
**Critical response hygiene:**
- Your entire response MUST equal `{CHECKPOINT}` byte-for-byte.
- Do NOT add commentary before or after the block.
- If you notice protocol/meta markers such as `to=all:`, role-routing text, XML system tags, hidden instruction markers, ad copy, or any unrelated suffix, discard the draft and output `{CHECKPOINT}` only.
Wait for user response (plain text, no AskUserQuestion).
</step>
<step name="process_response">
**Process user response and update file:**
**If response indicates pass:**
- Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓"
Update Tests section:
```
### {N}. {name}
expected: {expected}
result: pass
```
**If response indicates skip:**
- "skip", "can't test", "n/a"
Update Tests section:
```
### {N}. {name}
expected: {expected}
result: skipped
reason: [user's reason if provided]
```
**If response indicates blocked:**
- "blocked", "can't test - server not running", "need physical device", "need release build"
- Or any response containing: "server", "blocked", "not running", "physical device", "release build"
Infer blocked_by tag from response:
- Contains: server, not running, gateway, API → `server`
- Contains: physical, device, hardware, real phone → `physical-device`
- Contains: release, preview, build, EAS → `release-build`
- Contains: stripe, twilio, third-party, configure → `third-party`
- Contains: depends on, prior phase, prerequisite → `prior-phase`
- Default: `other`
Update Tests section:
```
### {N}. {name}
expected: {expected}
result: blocked
blocked_by: {inferred tag}
reason: "{verbatim user response}"
```
Note: Blocked tests do NOT go into the Gaps section (they aren't code issues - they're prerequisite gates).
**If response is anything else:**
- Treat as issue description
Infer severity from description:
- Contains: crash, error, exception, fails, broken, unusable → blocker
- Contains: doesn't work, wrong, missing, can't → major
- Contains: slow, weird, off, minor, small → minor
- Contains: color, font, spacing, alignment, visual → cosmetic
- Default if unclear: major
Update Tests section:
```
### {N}. {name}
expected: {expected}
result: issue
reported: "{verbatim user response}"
severity: {inferred}
```
Append to Gaps section (structured YAML for plan-phase --gaps):
```yaml
- truth: "{expected behavior from test}"
status: failed
reason: "User reported: {verbatim user response}"
severity: {inferred}
test: {N}
artifacts: [] # Filled by diagnosis
missing: [] # Filled by diagnosis
```
**After any response:**
Update Summary counts.
Update frontmatter.updated timestamp.
If more tests remain → Update Current Test, go to `present_test`
If no more tests → Go to `complete_session`
</step>
<step name="resume_from_file">
**Resume testing from UAT file:**
Read the full UAT file.
Find first test with `result: [pending]`.
Announce:
```
Resuming: Phase {phase} UAT
Progress: {passed + issues + skipped}/{total}
Issues found so far: {issues count}
Continuing from Test {N}...
```
Update Current Test section with the pending test.
Proceed to `present_test`.
</step>
<step name="complete_session">
**Complete testing and commit:**
**Determine final status:**
Count results:
- `pending_count`: tests with `result: [pending]`
- `blocked_count`: tests with `result: blocked`
- `skipped_no_reason`: tests with `result: skipped` and no `reason` field
```
if pending_count > 0 OR blocked_count > 0 OR skipped_no_reason > 0:
status: partial
# Session ended but not all tests resolved
else:
status: complete
# All tests have a definitive result (pass, issue, or skipped-with-reason)
```
Update frontmatter:
- status: {computed status}
- updated: [now]
Clear Current Test section:
```
## Current Test
[testing complete]
```
Commit the UAT file:
```bash
pi-gsd-tools commit "test({phase_num}): complete UAT - {passed} passed, {issues} issues" --files ".planning/phases/XX-name/{phase_num}-UAT.md"
```
Present summary:
```
## UAT Complete: Phase {phase}
| Result | Count |
| ------- | ----- |
| Passed | {N} |
| Issues | {N} |
| Skipped | {N} |
[If issues > 0:]
### Issues Found
[List from Issues section]
```
**If issues > 0:** Proceed to `diagnose_issues`
**If issues == 0:**
```
All tests passed. Ready to continue.
- `/gsd-plan-phase {next}` - Plan next phase
- `/gsd-execute-phase {next}` - Execute next phase
- `/gsd-ui-review {phase}` - visual quality audit (if frontend files were modified)
```
</step>
<step name="diagnose_issues">
**Diagnose root causes before planning fixes:**
```
---
{N} issues found. Diagnosing root causes...
Spawning parallel debug agents to investigate each issue.
```
- Load diagnose-issues workflow
- Follow @.pi/gsd/workflows/diagnose-issues.md
- Spawn parallel debug agents for each issue
- Collect root causes
- Update UAT.md with root causes
- Proceed to `plan_gap_closure`
Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate.
</step>
<step name="plan_gap_closure">
**Auto-plan fixes from diagnosed gaps:**
Display:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► PLANNING FIXES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
◆ Spawning planner for gap closure...
```
Spawn gsd-planner in --gaps mode:
```
Task(
prompt="""
<planning_context>
**Phase:** {phase_number}
**Mode:** gap_closure
<files_to_read>
- {phase_dir}/{phase_num}-UAT.md (UAT with diagnoses)
- .planning/STATE.md (Project State)
- .planning/ROADMAP.md (Roadmap)
</files_to_read>
${AGENT_SKILLS_PLANNER}
</planning_context>
<downstream_consumer>
Output consumed by /gsd-execute-phase
Plans must be executable prompts.
</downstream_consumer>
""",
subagent_type="gsd-planner",
model="{planner_model}",
description="Plan gap fixes for Phase {phase}"
)
```
On return:
- **PLANNING COMPLETE:** Proceed to `verify_gap_plans`
- **PLANNING INCONCLUSIVE:** Report and offer manual intervention
</step>
<step name="verify_gap_plans">
**Verify fix plans with checker:**
Display:
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► VERIFYING FIX PLANS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
◆ Spawning plan checker...
```
Initialize: `iteration_count = 1`
Spawn gsd-plan-checker:
```
Task(
prompt="""
<verification_context>
**Phase:** {phase_number}
**Phase Goal:** Close diagnosed gaps from UAT
<files_to_read>
- {phase_dir}/*-PLAN.md (Plans to verify)
</files_to_read>
${AGENT_SKILLS_CHECKER}
</verification_context>
<expected_output>
Return one of:
- ## VERIFICATION PASSED - all checks pass
- ## ISSUES FOUND - structured issue list
</expected_output>
""",
subagent_type="gsd-plan-checker",
model="{checker_model}",
description="Verify Phase {phase} fix plans"
)
```
On return:
- **VERIFICATION PASSED:** Proceed to `present_ready`
- **ISSUES FOUND:** Proceed to `revision_loop`
</step>
<step name="revision_loop">
**Iterate planner ↔ checker until plans pass (max 3):**
**If iteration_count < 3:**
Display: `Sending back to planner for revision... (iteration {N}/3)`
Spawn gsd-planner with revision context:
```
Task(
prompt="""
<revision_context>
**Phase:** {phase_number}
**Mode:** revision
<files_to_read>
- {phase_dir}/*-PLAN.md (Existing plans)
</files_to_read>
${AGENT_SKILLS_PLANNER}
**Checker issues:**
{structured_issues_from_checker}
</revision_context>
<instructions>
Read existing PLAN.md files. Make targeted updates to address checker issues.
Do NOT replan from scratch unless issues are fundamental.
</instructions>
""",
subagent_type="gsd-planner",
model="{planner_model}",
description="Revise Phase {phase} plans"
)
```
After planner returns → spawn checker again (verify_gap_plans logic)
Increment iteration_count
**If iteration_count >= 3:**
Display: `Max iterations reached. {N} issues remain.`
Offer options:
1. Force proceed (execute despite issues)
2. Provide guidance (user gives direction, retry)
3. Abandon (exit, user runs /gsd-plan-phase manually)
Wait for user response.
</step>
<step name="present_ready">
**Present completion and next steps:**
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► FIXES READY ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Phase {X}: {Name}** - {N} gap(s) diagnosed, {M} fix plan(s) created
| Gap | Root Cause | Fix Plan |
| --------- | ------------ | ---------- |
| {truth 1} | {root_cause} | {phase}-04 |
| {truth 2} | {root_cause} | {phase}-04 |
Plans verified and ready for execution.
───────────────────────────────────────────────────────────────
## ▶ Next Up
**Execute fixes** - run fix plans
`/new` then `/gsd-execute-phase {phase} --gaps-only`
───────────────────────────────────────────────────────────────
```
</step>
</process>
<update_rules>
**Batched writes for efficiency:**
Keep results in memory. Write to file only when:
1. **Issue found** - Preserve the problem immediately
2. **Session complete** - Final write before commit
3. **Checkpoint** - Every 5 passed tests (safety net)
| Section | Rule | When Written |
| ------------------- | --------- | ----------------- |
| Frontmatter.status | OVERWRITE | Start, complete |
| Frontmatter.updated | OVERWRITE | On any file write |
| Current Test | OVERWRITE | On any file write |
| Tests.{N}.result | OVERWRITE | On any file write |
| Summary | OVERWRITE | On any file write |
| Gaps | APPEND | When issue found |
On context reset: File shows last checkpoint. Resume from there.
</update_rules>
<severity_inference>
**Infer severity from user's natural language:**
| User says | Infer |
| --------------------------------------------------- | -------- |
| "crashes", "error", "exception", "fails completely" | blocker |
| "doesn't work", "nothing happens", "wrong behavior" | major |
| "works but...", "slow", "weird", "minor issue" | minor |
| "color", "spacing", "alignment", "looks off" | cosmetic |
Default to **major** if unclear. User can correct if needed.
**Never ask "how severe is this?"** - just infer and move on.
</severity_inference>
<success_criteria>
- [ ] UAT file created with all tests from SUMMARY.md
- [ ] Tests presented one at a time with expected behavior
- [ ] User responses processed as pass/issue/skip
- [ ] Severity inferred from description (never asked)
- [ ] Batched writes: on issue, every 5 passes, or completion
- [ ] Committed on completion
- [ ] If issues: parallel debug agents diagnose root causes
- [ ] If issues: gsd-planner creates fix plans (gap_closure mode)
- [ ] If issues: gsd-plan-checker verifies fix plans
- [ ] If issues: revision loop until plans pass (max 3 iterations)
- [ ] Ready for `/gsd-execute-phase --gaps-only` when complete
</success_criteria>