Files
AGENTS/.pi/gsd/workflows/verify-phase.md
2026-04-24 20:00:33 +02:00

12 KiB
Raw Permalink Blame History

Context (pre-injected)

Phase:

Phase Data:

Roadmap:

Verify phase goal achievement through goal-backward analysis. Check that the codebase delivers what the phase promised, not just that tasks completed.

Executed by a verification subagent spawned from execute-phase.md.

<core_principle> Task completion ≠ Goal achievement

A task "create chat component" can be marked complete when the component is a placeholder. The task was done - but the goal "working chat interface" was not achieved.

Goal-backward verification:

  1. What must be TRUE for the goal to be achieved?
  2. What must EXIST for those truths to hold?
  3. What must be WIRED for those artifacts to function?

Then verify each level against the actual codebase. </core_principle>

<required_reading> @.pi/gsd/references/verification-patterns.md @.pi/gsd/templates/verification-report.md </required_reading>

Load phase operation context:

Extract from init JSON: phase_dir, phase_number, phase_name, has_plans, plan_count.

Then load phase details and list plans/summaries:

pi-gsd-tools roadmap get-phase "${phase_number}"
grep -E "^| ${phase_number}" .planning/REQUIREMENTS.md 2>/dev/null || true
ls "$phase_dir"/*-SUMMARY.md "$phase_dir"/*-PLAN.md 2>/dev/null || true

Extract phase goal from ROADMAP.md (the outcome to verify, not tasks) and requirements from REQUIREMENTS.md if it exists.

**Option A: Must-haves in PLAN frontmatter**

Use gsd-tools to extract must_haves from each PLAN:

for plan in "$PHASE_DIR"/*-PLAN.md; do
  MUST_HAVES=$(pi-gsd-tools frontmatter get "$plan" --field must_haves)
  echo "=== $plan ===" && echo "$MUST_HAVES"
done

Returns JSON: { truths: [...], artifacts: [...], key_links: [...] }

Aggregate all must_haves across plans for phase-level verification.

Option B: Use Success Criteria from ROADMAP.md

If no must_haves in frontmatter (MUST_HAVES returns error or empty), check for Success Criteria:

PHASE_DATA=$(pi-gsd-tools roadmap get-phase "${phase_number}" --raw)

Parse the success_criteria array from the JSON output. If non-empty:

  1. Use each Success Criterion directly as a truth (they are already written as observable, testable behaviors)
  2. Derive artifacts (concrete file paths for each truth)
  3. Derive key links (critical wiring where stubs hide)
  4. Document the must-haves before proceeding

Success Criteria from ROADMAP.md are the contract - they override PLAN-level must_haves when both exist.

Option C: Derive from phase goal (fallback)

If no must_haves in frontmatter AND no Success Criteria in ROADMAP:

  1. State the goal from ROADMAP.md
  2. Derive truths (3-7 observable behaviors, each testable)
  3. Derive artifacts (concrete file paths for each truth)
  4. Derive key links (critical wiring where stubs hide)
  5. Document derived must-haves before proceeding
For each observable truth, determine if the codebase enables it.

Status: ✓ VERIFIED (all supporting artifacts pass) | ✗ FAILED (artifact missing/stub/unwired) | ? UNCERTAIN (needs human)

For each truth: identify supporting artifacts → check artifact status → check wiring → determine truth status.

Example: Truth "User can see existing messages" depends on Chat.tsx (renders), /api/chat GET (provides), Message model (schema). If Chat.tsx is a stub or API returns hardcoded [] → FAILED. If all exist, are substantive, and connected → VERIFIED.

Use gsd-tools for artifact verification against must_haves in each PLAN:
for plan in "$PHASE_DIR"/*-PLAN.md; do
  ARTIFACT_RESULT=$(pi-gsd-tools verify artifacts "$plan")
  echo "=== $plan ===" && echo "$ARTIFACT_RESULT"
done

Parse JSON result: { all_passed, passed, total, artifacts: [{path, exists, issues, passed}] }

Artifact status from result:

  • exists=false → MISSING
  • issues not empty → STUB (check issues for "Only N lines" or "Missing pattern")
  • passed=true → VERIFIED (Levels 1-2 pass)

Level 3 - Wired (manual check for artifacts that pass Levels 1-2):

grep -r "import.*$artifact_name" src/ --include="*.ts" --include="*.tsx"  # IMPORTED
grep -r "$artifact_name" src/ --include="*.ts" --include="*.tsx" | grep -v "import"  # USED

WIRED = imported AND used. ORPHANED = exists but not imported/used.

Exists Substantive Wired Status
✓ VERIFIED
⚠️ ORPHANED
- ✗ STUB
- - ✗ MISSING

Export-level spot check (WARNING severity):

For artifacts that pass Level 3, spot-check individual exports:

  • Extract key exported symbols (functions, constants, classes - skip types/interfaces)
  • For each, grep for usage outside the defining file
  • Flag exports with zero external call sites as "exported but unused"

This catches dead stores like setPlan() that exist in a wired file but are never actually called. Report as WARNING - may indicate incomplete cross-plan wiring or leftover code from plan revisions.

Use gsd-tools for key link verification against must_haves in each PLAN:
for plan in "$PHASE_DIR"/*-PLAN.md; do
  LINKS_RESULT=$(pi-gsd-tools verify key-links "$plan")
  echo "=== $plan ===" && echo "$LINKS_RESULT"
done

Parse JSON result: { all_verified, verified, total, links: [{from, to, via, verified, detail}] }

Link status from result:

  • verified=true → WIRED
  • verified=false with "not found" → NOT_WIRED
  • verified=false with "Pattern not found" → PARTIAL

Fallback patterns (if key_links not in must_haves):

Pattern Check Status
Component → API fetch/axios call to API path, response used (await/.then/setState) WIRED / PARTIAL (call but unused response) / NOT_WIRED
API → Database Prisma/DB query on model, result returned via res.json() WIRED / PARTIAL (query but not returned) / NOT_WIRED
Form → Handler onSubmit with real implementation (fetch/axios/mutate/dispatch), not console.log/empty WIRED / STUB (log-only/empty) / NOT_WIRED
State → Render useState variable appears in JSX ({stateVar} or {stateVar.property}) WIRED / NOT_WIRED

Record status and evidence for each key link.

If REQUIREMENTS.md exists: ```bash grep -E "Phase ${PHASE_NUM}" .planning/REQUIREMENTS.md 2>/dev/null || true ```

For each requirement: parse description → identify supporting truths/artifacts → status: ✓ SATISFIED / ✗ BLOCKED / ? NEEDS HUMAN.

Extract files modified in this phase from SUMMARY.md, scan each:
Pattern Search Severity
TODO/FIXME/XXX/HACK grep -n -E "TODO|FIXME|XXX|HACK" ⚠️ Warning
Placeholder content grep -n -iE "placeholder|coming soon|will be here" 🛑 Blocker
Empty returns grep -n -E "return null|return \{\}|return \[\]|=> \{\}" ⚠️ Warning
Log-only functions Functions containing only console.log ⚠️ Warning

Categorize: 🛑 Blocker (prevents goal) | ⚠️ Warning (incomplete) | Info (notable).

**Always needs human:** Visual appearance, user flow completion, real-time behavior (WebSocket/SSE), external service integration, performance feel, error message clarity.

Needs human if uncertain: Complex wiring grep can't trace, dynamic state-dependent behavior, edge cases.

Format each as: Test Name → What to do → Expected result → Why can't verify programmatically.

**passed:** All truths VERIFIED, all artifacts pass levels 1-3, all key links WIRED, no blocker anti-patterns.

gaps_found: Any truth FAILED, artifact MISSING/STUB, key link NOT_WIRED, or blocker found.

human_needed: All automated checks pass but human verification items remain.

Score: verified_truths / total_truths

If gaps_found:
  1. Cluster related gaps: API stub + component unwired → "Wire frontend to backend". Multiple missing → "Complete core implementation". Wiring only → "Connect existing components".

  2. Generate plan per cluster: Objective, 2-3 tasks (files/action/verify each), re-verify step. Keep focused: single concern per plan.

  3. Order by dependency: Fix missing → fix stubs → fix wiring → verify.

```bash REPORT_PATH="$PHASE_DIR/${PHASE_NUM}-VERIFICATION.md" ```

Fill template sections: frontmatter (phase/timestamp/status/score), goal achievement, artifact table, wiring table, requirements coverage, anti-patterns, human verification, gaps summary, fix plans (if gaps_found), metadata.

See .pi/gsd/templates/verification-report.md for complete template.

Return status (`passed` | `gaps_found` | `human_needed`), score (N/M must-haves), report path.

If gaps_found: list gaps + recommended fix plan names. If human_needed: list items requiring human testing.

Orchestrator routes: passed → update_roadmap | gaps_found → create/execute fixes, re-verify | human_needed → present to user.

<success_criteria>

  • Must-haves established (from frontmatter or derived)
  • All truths verified with status and evidence
  • All artifacts checked at all three levels
  • All key links verified
  • Requirements coverage assessed (if applicable)
  • Anti-patterns scanned and categorized
  • Human verification items identified
  • Overall status determined
  • Fix plans generated (if gaps_found)
  • VERIFICATION.md created with complete report
  • Results returned to orchestrator </success_criteria>