12 KiB
Context (pre-injected)
Phase:
Phase Data:
Roadmap:
Verify phase goal achievement through goal-backward analysis. Check that the codebase delivers what the phase promised, not just that tasks completed.Executed by a verification subagent spawned from execute-phase.md.
<core_principle> Task completion ≠ Goal achievement
A task "create chat component" can be marked complete when the component is a placeholder. The task was done - but the goal "working chat interface" was not achieved.
Goal-backward verification:
- What must be TRUE for the goal to be achieved?
- What must EXIST for those truths to hold?
- What must be WIRED for those artifacts to function?
Then verify each level against the actual codebase. </core_principle>
<required_reading> @.pi/gsd/references/verification-patterns.md @.pi/gsd/templates/verification-report.md </required_reading>
Load phase operation context:Extract from init JSON: phase_dir, phase_number, phase_name, has_plans, plan_count.
Then load phase details and list plans/summaries:
pi-gsd-tools roadmap get-phase "${phase_number}"
grep -E "^| ${phase_number}" .planning/REQUIREMENTS.md 2>/dev/null || true
ls "$phase_dir"/*-SUMMARY.md "$phase_dir"/*-PLAN.md 2>/dev/null || true
Extract phase goal from ROADMAP.md (the outcome to verify, not tasks) and requirements from REQUIREMENTS.md if it exists.
**Option A: Must-haves in PLAN frontmatter**Use gsd-tools to extract must_haves from each PLAN:
for plan in "$PHASE_DIR"/*-PLAN.md; do
MUST_HAVES=$(pi-gsd-tools frontmatter get "$plan" --field must_haves)
echo "=== $plan ===" && echo "$MUST_HAVES"
done
Returns JSON: { truths: [...], artifacts: [...], key_links: [...] }
Aggregate all must_haves across plans for phase-level verification.
Option B: Use Success Criteria from ROADMAP.md
If no must_haves in frontmatter (MUST_HAVES returns error or empty), check for Success Criteria:
PHASE_DATA=$(pi-gsd-tools roadmap get-phase "${phase_number}" --raw)
Parse the success_criteria array from the JSON output. If non-empty:
- Use each Success Criterion directly as a truth (they are already written as observable, testable behaviors)
- Derive artifacts (concrete file paths for each truth)
- Derive key links (critical wiring where stubs hide)
- Document the must-haves before proceeding
Success Criteria from ROADMAP.md are the contract - they override PLAN-level must_haves when both exist.
Option C: Derive from phase goal (fallback)
If no must_haves in frontmatter AND no Success Criteria in ROADMAP:
- State the goal from ROADMAP.md
- Derive truths (3-7 observable behaviors, each testable)
- Derive artifacts (concrete file paths for each truth)
- Derive key links (critical wiring where stubs hide)
- Document derived must-haves before proceeding
Status: ✓ VERIFIED (all supporting artifacts pass) | ✗ FAILED (artifact missing/stub/unwired) | ? UNCERTAIN (needs human)
For each truth: identify supporting artifacts → check artifact status → check wiring → determine truth status.
Example: Truth "User can see existing messages" depends on Chat.tsx (renders), /api/chat GET (provides), Message model (schema). If Chat.tsx is a stub or API returns hardcoded [] → FAILED. If all exist, are substantive, and connected → VERIFIED.
Use gsd-tools for artifact verification against must_haves in each PLAN:for plan in "$PHASE_DIR"/*-PLAN.md; do
ARTIFACT_RESULT=$(pi-gsd-tools verify artifacts "$plan")
echo "=== $plan ===" && echo "$ARTIFACT_RESULT"
done
Parse JSON result: { all_passed, passed, total, artifacts: [{path, exists, issues, passed}] }
Artifact status from result:
exists=false→ MISSINGissuesnot empty → STUB (check issues for "Only N lines" or "Missing pattern")passed=true→ VERIFIED (Levels 1-2 pass)
Level 3 - Wired (manual check for artifacts that pass Levels 1-2):
grep -r "import.*$artifact_name" src/ --include="*.ts" --include="*.tsx" # IMPORTED
grep -r "$artifact_name" src/ --include="*.ts" --include="*.tsx" | grep -v "import" # USED
WIRED = imported AND used. ORPHANED = exists but not imported/used.
| Exists | Substantive | Wired | Status |
|---|---|---|---|
| ✓ | ✓ | ✓ | ✓ VERIFIED |
| ✓ | ✓ | ✗ | ⚠️ ORPHANED |
| ✓ | ✗ | - | ✗ STUB |
| ✗ | - | - | ✗ MISSING |
Export-level spot check (WARNING severity):
For artifacts that pass Level 3, spot-check individual exports:
- Extract key exported symbols (functions, constants, classes - skip types/interfaces)
- For each, grep for usage outside the defining file
- Flag exports with zero external call sites as "exported but unused"
This catches dead stores like setPlan() that exist in a wired file but are
never actually called. Report as WARNING - may indicate incomplete cross-plan
wiring or leftover code from plan revisions.
for plan in "$PHASE_DIR"/*-PLAN.md; do
LINKS_RESULT=$(pi-gsd-tools verify key-links "$plan")
echo "=== $plan ===" && echo "$LINKS_RESULT"
done
Parse JSON result: { all_verified, verified, total, links: [{from, to, via, verified, detail}] }
Link status from result:
verified=true→ WIREDverified=falsewith "not found" → NOT_WIREDverified=falsewith "Pattern not found" → PARTIAL
Fallback patterns (if key_links not in must_haves):
| Pattern | Check | Status |
|---|---|---|
| Component → API | fetch/axios call to API path, response used (await/.then/setState) | WIRED / PARTIAL (call but unused response) / NOT_WIRED |
| API → Database | Prisma/DB query on model, result returned via res.json() | WIRED / PARTIAL (query but not returned) / NOT_WIRED |
| Form → Handler | onSubmit with real implementation (fetch/axios/mutate/dispatch), not console.log/empty | WIRED / STUB (log-only/empty) / NOT_WIRED |
| State → Render | useState variable appears in JSX ({stateVar} or {stateVar.property}) |
WIRED / NOT_WIRED |
Record status and evidence for each key link.
If REQUIREMENTS.md exists: ```bash grep -E "Phase ${PHASE_NUM}" .planning/REQUIREMENTS.md 2>/dev/null || true ```For each requirement: parse description → identify supporting truths/artifacts → status: ✓ SATISFIED / ✗ BLOCKED / ? NEEDS HUMAN.
Extract files modified in this phase from SUMMARY.md, scan each:| Pattern | Search | Severity |
|---|---|---|
| TODO/FIXME/XXX/HACK | grep -n -E "TODO|FIXME|XXX|HACK" |
⚠️ Warning |
| Placeholder content | grep -n -iE "placeholder|coming soon|will be here" |
🛑 Blocker |
| Empty returns | grep -n -E "return null|return \{\}|return \[\]|=> \{\}" |
⚠️ Warning |
| Log-only functions | Functions containing only console.log | ⚠️ Warning |
Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | ℹ️ Info (notable).
**Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.Needs human if uncertain: Complex wiring grep can't trace, dynamic state-dependent behavior, edge cases.
Format each as: Test Name → What to do → Expected result → Why can't verify programmatically.
**passed:** All truths VERIFIED, all artifacts pass levels 1-3, all key links WIRED, no blocker anti-patterns.gaps_found: Any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, or blocker found.
human_needed: All automated checks pass but human verification items remain.
Score: verified_truths / total_truths
-
Cluster related gaps: API stub + component unwired → "Wire frontend to backend". Multiple missing → "Complete core implementation". Wiring only → "Connect existing components".
-
Generate plan per cluster: Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.
-
Order by dependency: Fix missing → fix stubs → fix wiring → verify.
Fill template sections: frontmatter (phase/timestamp/status/score), goal achievement, artifact table, wiring table, requirements coverage, anti-patterns, human verification, gaps summary, fix plans (if gaps_found), metadata.
See .pi/gsd/templates/verification-report.md for complete template.
Return status (`passed` | `gaps_found` | `human_needed`), score (N/M must-haves), report path.If gaps_found: list gaps + recommended fix plan names. If human_needed: list items requiring human testing.
Orchestrator routes: passed → update_roadmap | gaps_found → create/execute fixes, re-verify | human_needed → present to user.
<success_criteria>
- Must-haves established (from frontmatter or derived)
- All truths verified with status and evidence
- All artifacts checked at all three levels
- All key links verified
- Requirements coverage assessed (if applicable)
- Anti-patterns scanned and categorized
- Human verification items identified
- Overall status determined
- Fix plans generated (if gaps_found)
- VERIFICATION.md created with complete report
- Results returned to orchestrator </success_criteria>