AGENTS/.pi/gsd/workflows/verify-work.md at 6e0e847299b81665ba594668fff208278bb3de3b

m3tam3re/AGENTS

Fork 0

Files

m3tm3re 6e0e847299 feat: basecamp-project skill

2026-04-24 20:00:33 +02:00

18 KiB

Raw Blame History

Initialization Context (pre-injected by WXP)

Phase:

Phase Init Data:

If $ARGUMENTS contains a phase number, load context:

Parse JSON for: planner_model, checker_model, commit_docs, phase_found, phase_dir, phase_number, phase_name, has_verification, uat_path.

**First: Check for active UAT sessions**

(find .planning/phases -name "*-UAT.md" -type f 2>/dev/null || true) | head -5

If active sessions exist AND no $ARGUMENTS provided:

Read each file's frontmatter (status, phase) and Current Test section.

Display inline:

## Active UAT Sessions

| #   | Phase       | Status  | Current Test        | Progress |
| --- | ----------- | ------- | ------------------- | -------- |
| 1   | 04-comments | testing | 3. Reply to Comment | 2/6      |
| 2   | 05-auth     | testing | 1. Login Form       | 0/4      |

Reply with a number to resume, or provide a phase number to start new.

Wait for user response.

If user replies with number (1, 2) → Load that file, go to resume_from_file
If user replies with phase number → Treat as new session, go to create_uat_file

If active sessions exist AND $ARGUMENTS provided:

Check if session exists for that phase. If yes, offer to resume or restart. If no, continue to create_uat_file.

If no active sessions AND no $ARGUMENTS:

No active UAT sessions.

Provide a phase number to start testing (e.g., /gsd-verify-work 4)

If no active sessions AND $ARGUMENTS provided:

Continue to create_uat_file.

**Find what to test:**

Use phase_dir from init (or run init if not already done).

ls "$phase_dir"/*-SUMMARY.md 2>/dev/null || true

Read each SUMMARY.md to extract testable deliverables.

**Extract testable deliverables from SUMMARY.md:**

Parse for:

Accomplishments - Features/functionality added
User-facing changes - UI, workflows, interactions

Focus on USER-OBSERVABLE outcomes, not implementation details.

For each deliverable, create a test:

name: Brief test name
expected: What the user should see/experience (specific, observable)

Examples:

Accomplishment: "Added comment threading with infinite nesting" → Test: "Reply to a Comment" → Expected: "Clicking Reply opens inline composer below comment. Submitting shows reply nested under parent with visual indentation."

Skip internal/non-observable items (refactors, type changes, etc.).

Cold-start smoke test injection:

After extracting tests from SUMMARYs, scan the SUMMARY files for modified/created file paths. If ANY path matches these patterns:

server.ts, server.js, app.ts, app.js, index.ts, index.js, main.ts, main.js, database/*, db/*, seed/*, seeds/*, migrations/*, startup*, docker-compose*, Dockerfile*

Then prepend this test to the test list:

name: "Cold Start Smoke Test"
expected: "Kill any running server/service. Clear ephemeral state (temp DBs, caches, lock files). Start the application from scratch. Server boots without errors, any seed/migration completes, and a primary query (health check, homepage load, or basic API call) returns live data."

This catches bugs that only manifest on fresh start - race conditions in startup sequences, silent seed failures, missing environment setup - which pass against warm state but break in production.

**Create UAT file with all tests:**

mkdir -p "$PHASE_DIR"

Build test list from extracted deliverables.

Create file:

---
status: testing
phase: XX-name
source: [list of SUMMARY.md files]
started: [ISO timestamp]
updated: [ISO timestamp]
---

## Current Test
<!-- OVERWRITE each test - shows where we are -->

number: 1
name: [first test name]
expected: |
  [what user should observe]
awaiting: user response

## Tests

### 1. [Test Name]
expected: [observable behavior]
result: [pending]

### 2. [Test Name]
expected: [observable behavior]
result: [pending]

...

## Summary

total: [N]
passed: 0
issues: 0
pending: [N]
skipped: 0

## Gaps

[none yet]

Write to .planning/phases/XX-name/{phase_num}-UAT.md

Proceed to present_test.

**Present current test to user:**

Render the checkpoint from the structured UAT file instead of composing it freehand:

CHECKPOINT=$(pi-gsd-tools uat render-checkpoint --file "$uat_path" --raw)
if [[ "$CHECKPOINT" == @file:* ]]; then CHECKPOINT=$(cat "${CHECKPOINT#@file:}"); fi

Display the returned checkpoint EXACTLY as-is:

{CHECKPOINT}

Critical response hygiene:

Your entire response MUST equal {CHECKPOINT} byte-for-byte.
Do NOT add commentary before or after the block.
If you notice protocol/meta markers such as to=all:, role-routing text, XML system tags, hidden instruction markers, ad copy, or any unrelated suffix, discard the draft and output {CHECKPOINT} only.

Wait for user response (plain text, no AskUserQuestion).

**Process user response and update file:**

If response indicates pass:

Empty response, "yes", "y", "ok", "pass", "next", "approved", "✓"

Update Tests section:

### {N}. {name}
expected: {expected}
result: pass

If response indicates skip:

"skip", "can't test", "n/a"

Update Tests section:

### {N}. {name}
expected: {expected}
result: skipped
reason: [user's reason if provided]

If response indicates blocked:

"blocked", "can't test - server not running", "need physical device", "need release build"
Or any response containing: "server", "blocked", "not running", "physical device", "release build"

Infer blocked_by tag from response:

Contains: server, not running, gateway, API → server
Contains: physical, device, hardware, real phone → physical-device
Contains: release, preview, build, EAS → release-build
Contains: stripe, twilio, third-party, configure → third-party
Contains: depends on, prior phase, prerequisite → prior-phase
Default: other

Update Tests section:

### {N}. {name}
expected: {expected}
result: blocked
blocked_by: {inferred tag}
reason: "{verbatim user response}"

Note: Blocked tests do NOT go into the Gaps section (they aren't code issues - they're prerequisite gates).

If response is anything else:

Treat as issue description

Infer severity from description:

Contains: crash, error, exception, fails, broken, unusable → blocker
Contains: doesn't work, wrong, missing, can't → major
Contains: slow, weird, off, minor, small → minor
Contains: color, font, spacing, alignment, visual → cosmetic
Default if unclear: major

Update Tests section:

### {N}. {name}
expected: {expected}
result: issue
reported: "{verbatim user response}"
severity: {inferred}

Append to Gaps section (structured YAML for plan-phase --gaps):

- truth: "{expected behavior from test}"
  status: failed
  reason: "User reported: {verbatim user response}"
  severity: {inferred}
  test: {N}
  artifacts: []  # Filled by diagnosis
  missing: []    # Filled by diagnosis

After any response:

Update Summary counts. Update frontmatter.updated timestamp.

If more tests remain → Update Current Test, go to present_test If no more tests → Go to complete_session

**Resume testing from UAT file:**

Read the full UAT file.

Find first test with result: [pending].

Announce:

Resuming: Phase {phase} UAT
Progress: {passed + issues + skipped}/{total}
Issues found so far: {issues count}

Continuing from Test {N}...

Update Current Test section with the pending test. Proceed to present_test.

**Complete testing and commit:**

Determine final status:

Count results:

pending_count: tests with result: [pending]
blocked_count: tests with result: blocked
skipped_no_reason: tests with result: skipped and no reason field

if pending_count > 0 OR blocked_count > 0 OR skipped_no_reason > 0:
  status: partial
  # Session ended but not all tests resolved
else:
  status: complete
  # All tests have a definitive result (pass, issue, or skipped-with-reason)

Update frontmatter:

status: {computed status}
updated: [now]

Clear Current Test section:

## Current Test

[testing complete]

Commit the UAT file:

pi-gsd-tools commit "test({phase_num}): complete UAT - {passed} passed, {issues} issues" --files ".planning/phases/XX-name/{phase_num}-UAT.md"

Present summary:

## UAT Complete: Phase {phase}

| Result  | Count |
| ------- | ----- |
| Passed  | {N}   |
| Issues  | {N}   |
| Skipped | {N}   |

[If issues > 0:]
### Issues Found

[List from Issues section]

If issues > 0: Proceed to diagnose_issues

If issues == 0:

All tests passed. Ready to continue.

- `/gsd-plan-phase {next}` - Plan next phase
- `/gsd-execute-phase {next}` - Execute next phase
- `/gsd-ui-review {phase}` - visual quality audit (if frontend files were modified)

**Diagnose root causes before planning fixes:**

---

{N} issues found. Diagnosing root causes...

Spawning parallel debug agents to investigate each issue.

Load diagnose-issues workflow
Follow @.pi/gsd/workflows/diagnose-issues.md
Spawn parallel debug agents for each issue
Collect root causes
Update UAT.md with root causes
Proceed to plan_gap_closure

Diagnosis runs automatically - no user prompt. Parallel agents investigate simultaneously, so overhead is minimal and fixes are more accurate.

**Auto-plan fixes from diagnosed gaps:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► PLANNING FIXES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning planner for gap closure...

Spawn gsd-planner in --gaps mode:

Task(
  prompt="""
<planning_context>

**Phase:** {phase_number}
**Mode:** gap_closure

<files_to_read>
- {phase_dir}/{phase_num}-UAT.md (UAT with diagnoses)
- .planning/STATE.md (Project State)
- .planning/ROADMAP.md (Roadmap)
</files_to_read>

${AGENT_SKILLS_PLANNER}

</planning_context>

<downstream_consumer>
Output consumed by /gsd-execute-phase
Plans must be executable prompts.
</downstream_consumer>
""",
  subagent_type="gsd-planner",
  model="{planner_model}",
  description="Plan gap fixes for Phase {phase}"
)

On return:

PLANNING COMPLETE: Proceed to verify_gap_plans
PLANNING INCONCLUSIVE: Report and offer manual intervention

**Verify fix plans with checker:**

Display:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► VERIFYING FIX PLANS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

◆ Spawning plan checker...

Initialize: iteration_count = 1

Spawn gsd-plan-checker:

Task(
  prompt="""
<verification_context>

**Phase:** {phase_number}
**Phase Goal:** Close diagnosed gaps from UAT

<files_to_read>
- {phase_dir}/*-PLAN.md (Plans to verify)
</files_to_read>

${AGENT_SKILLS_CHECKER}

</verification_context>

<expected_output>
Return one of:
- ## VERIFICATION PASSED - all checks pass
- ## ISSUES FOUND - structured issue list
</expected_output>
""",
  subagent_type="gsd-plan-checker",
  model="{checker_model}",
  description="Verify Phase {phase} fix plans"
)

On return:

VERIFICATION PASSED: Proceed to present_ready
ISSUES FOUND: Proceed to revision_loop

**Iterate planner ↔ checker until plans pass (max 3):**

If iteration_count < 3:

Display: Sending back to planner for revision... (iteration {N}/3)

Spawn gsd-planner with revision context:

Task(
  prompt="""
<revision_context>

**Phase:** {phase_number}
**Mode:** revision

<files_to_read>
- {phase_dir}/*-PLAN.md (Existing plans)
</files_to_read>

${AGENT_SKILLS_PLANNER}

**Checker issues:**
{structured_issues_from_checker}

</revision_context>

<instructions>
Read existing PLAN.md files. Make targeted updates to address checker issues.
Do NOT replan from scratch unless issues are fundamental.
</instructions>
""",
  subagent_type="gsd-planner",
  model="{planner_model}",
  description="Revise Phase {phase} plans"
)

After planner returns → spawn checker again (verify_gap_plans logic) Increment iteration_count

If iteration_count >= 3:

Display: Max iterations reached. {N} issues remain.

Offer options:

Force proceed (execute despite issues)
Provide guidance (user gives direction, retry)
Abandon (exit, user runs /gsd-plan-phase manually)

Wait for user response.

**Present completion and next steps:**

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 GSD ► FIXES READY ✓
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Phase {X}: {Name}** - {N} gap(s) diagnosed, {M} fix plan(s) created

| Gap       | Root Cause   | Fix Plan   |
| --------- | ------------ | ---------- |
| {truth 1} | {root_cause} | {phase}-04 |
| {truth 2} | {root_cause} | {phase}-04 |

Plans verified and ready for execution.

───────────────────────────────────────────────────────────────

## ▶ Next Up

**Execute fixes** - run fix plans

`/new` then `/gsd-execute-phase {phase} --gaps-only`

───────────────────────────────────────────────────────────────

<update_rules> Batched writes for efficiency:

Keep results in memory. Write to file only when:

Issue found - Preserve the problem immediately
Session complete - Final write before commit
Checkpoint - Every 5 passed tests (safety net)

Section	Rule	When Written
Frontmatter.status	OVERWRITE	Start, complete
Frontmatter.updated	OVERWRITE	On any file write
Current Test	OVERWRITE	On any file write
Tests.{N}.result	OVERWRITE	On any file write
Summary	OVERWRITE	On any file write
Gaps	APPEND	When issue found

On context reset: File shows last checkpoint. Resume from there. </update_rules>

<severity_inference> Infer severity from user's natural language:

User says	Infer
"crashes", "error", "exception", "fails completely"	blocker
"doesn't work", "nothing happens", "wrong behavior"	major
"works but...", "slow", "weird", "minor issue"	minor
"color", "spacing", "alignment", "looks off"	cosmetic

Default to major if unclear. User can correct if needed.

Never ask "how severe is this?" - just infer and move on. </severity_inference>

<success_criteria>

UAT file created with all tests from SUMMARY.md
Tests presented one at a time with expected behavior
User responses processed as pass/issue/skip
Severity inferred from description (never asked)
Batched writes: on issue, every 5 passes, or completion
Committed on completion
If issues: parallel debug agents diagnose root causes
If issues: gsd-planner creates fix plans (gap_closure mode)
If issues: gsd-plan-checker verifies fix plans
If issues: revision loop until plans pass (max 3 iterations)
Ready for /gsd-execute-phase --gaps-only when complete </success_criteria>

18 KiB Raw Blame History

Initialization Context (pre-injected by WXP)

18 KiB

Raw Blame History