feat: basecamp-project skill

2026-04-24 20:00:33 +02:00
parent 0ad41acb03
commit 6e0e847299
211 changed files with 46029 additions and 2592 deletions
--- a/.pi/gsd/references/checkpoints.md
+++ b/.pi/gsd/references/checkpoints.md
@@ -0,0 +1,778 @@
+<overview>
+Plans execute autonomously. Checkpoints formalize interaction points where human verification or decisions are needed.
+
+**Core principle:** the agent automates everything with CLI/API. Checkpoints are for verification and decisions, not manual work.
+
+**Golden rules:**
+1. **If the agent can run it, the agent runs it** - Never ask user to execute CLI commands, start servers, or run builds
+2. **the agent sets up the verification environment** - Start dev servers, seed databases, configure env vars
+3. **User only does what requires human judgment** - Visual checks, UX evaluation, "does this feel right?"
+4. **Secrets come from user, automation comes from the agent** - Ask for API keys, then the agent uses them via CLI
+5. **Auto-mode bypasses verification/decision checkpoints** - When `workflow._auto_chain_active` or `workflow.auto_advance` is true in config: human-verify auto-approves, decision auto-selects first option, human-action still stops (auth gates cannot be automated)
+</overview>
+
+<checkpoint_types>
+
+<type name="human-verify">
+## checkpoint:human-verify (Most Common - 90%)
+
+**When:** the agent completed automated work, human confirms it works correctly.
+
+**Use for:**
+- Visual UI checks (layout, styling, responsiveness)
+- Interactive flows (click through wizard, test user flows)
+- Functional verification (feature works as expected)
+- Audio/video playback quality
+- Animation smoothness
+- Accessibility testing
+
+**Structure:**
+```xml
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>[What the agent automated and deployed/built]</what-built>
+  <how-to-verify>
+    [Exact steps to test - URLs, commands, expected behavior]
+  </how-to-verify>
+  <resume-signal>[How to continue - "approved", "yes", or describe issues]</resume-signal>
+</task>
+```
+
+**Example: UI Component (shows key pattern: the agent starts server BEFORE checkpoint)**
+```xml
+<task type="auto">
+  <name>Build responsive dashboard layout</name>
+  <files>src/components/Dashboard.tsx, src/app/dashboard/page.tsx</files>
+  <action>Create dashboard with sidebar, header, and content area. Use Tailwind responsive classes for mobile.</action>
+  <verify>npm run build succeeds, no TypeScript errors</verify>
+  <done>Dashboard component builds without errors</done>
+</task>
+
+<task type="auto">
+  <name>Start dev server for verification</name>
+  <action>Run `npm run dev` in background, wait for "ready" message, capture port</action>
+  <verify>fetch http://localhost:3000 returns 200</verify>
+  <done>Dev server running at http://localhost:3000</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Responsive dashboard layout - dev server running at http://localhost:3000</what-built>
+  <how-to-verify>
+    Visit http://localhost:3000/dashboard and verify:
+    1. Desktop (>1024px): Sidebar left, content right, header top
+    2. Tablet (768px): Sidebar collapses to hamburger menu
+    3. Mobile (375px): Single column layout, bottom nav appears
+    4. No layout shift or horizontal scroll at any size
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe layout issues</resume-signal>
+</task>
+```
+
+**Example: Xcode Build**
+```xml
+<task type="auto">
+  <name>Build macOS app with Xcode</name>
+  <files>App.xcodeproj, Sources/</files>
+  <action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check for compilation errors in output.</action>
+  <verify>Build output contains "BUILD SUCCEEDED", no errors</verify>
+  <done>App builds successfully</done>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
+  <how-to-verify>
+    Open App.app and test:
+    - App launches without crashes
+    - Menu bar icon appears
+    - Preferences window opens correctly
+    - No visual glitches or layout issues
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+</type>
+
+<type name="decision">
+## checkpoint:decision (9%)
+
+**When:** Human must make choice that affects implementation direction.
+
+**Use for:**
+- Technology selection (which auth provider, which database)
+- Architecture decisions (monorepo vs separate repos)
+- Design choices (color scheme, layout approach)
+- Feature prioritization (which variant to build)
+- Data model decisions (schema structure)
+
+**Structure:**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>[What's being decided]</decision>
+  <context>[Why this decision matters]</context>
+  <options>
+    <option id="option-a">
+      <name>[Option name]</name>
+      <pros>[Benefits]</pros>
+      <cons>[Tradeoffs]</cons>
+    </option>
+    <option id="option-b">
+      <name>[Option name]</name>
+      <pros>[Benefits]</pros>
+      <cons>[Tradeoffs]</cons>
+    </option>
+  </options>
+  <resume-signal>[How to indicate choice]</resume-signal>
+</task>
+```
+
+**Example: Auth Provider Selection**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>Select authentication provider</decision>
+  <context>
+    Need user authentication for the app. Three solid options with different tradeoffs.
+  </context>
+  <options>
+    <option id="supabase">
+      <name>Supabase Auth</name>
+      <pros>Built-in with Supabase DB we're using, generous free tier, row-level security integration</pros>
+      <cons>Less customizable UI, tied to Supabase ecosystem</cons>
+    </option>
+    <option id="clerk">
+      <name>Clerk</name>
+      <pros>Beautiful pre-built UI, best developer experience, excellent docs</pros>
+      <cons>Paid after 10k MAU, vendor lock-in</cons>
+    </option>
+    <option id="nextauth">
+      <name>NextAuth.js</name>
+      <pros>Free, self-hosted, maximum control, widely adopted</pros>
+      <cons>More setup work, you manage security updates, UI is DIY</cons>
+    </option>
+  </options>
+  <resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
+</task>
+```
+
+**Example: Database Selection**
+```xml
+<task type="checkpoint:decision" gate="blocking">
+  <decision>Select database for user data</decision>
+  <context>
+    App needs persistent storage for users, sessions, and user-generated content.
+    Expected scale: 10k users, 1M records first year.
+  </context>
+  <options>
+    <option id="supabase">
+      <name>Supabase (Postgres)</name>
+      <pros>Full SQL, generous free tier, built-in auth, real-time subscriptions</pros>
+      <cons>Vendor lock-in for real-time features, less flexible than raw Postgres</cons>
+    </option>
+    <option id="planetscale">
+      <name>PlanetScale (MySQL)</name>
+      <pros>Serverless scaling, branching workflow, excellent DX</pros>
+      <cons>MySQL not Postgres, no foreign keys in free tier</cons>
+    </option>
+    <option id="convex">
+      <name>Convex</name>
+      <pros>Real-time by default, TypeScript-native, automatic caching</pros>
+      <cons>Newer platform, different mental model, less SQL flexibility</cons>
+    </option>
+  </options>
+  <resume-signal>Select: supabase, planetscale, or convex</resume-signal>
+</task>
+```
+</type>
+
+<type name="human-action">
+## checkpoint:human-action (1% - Rare)
+
+**When:** Action has NO CLI/API and requires human-only interaction, OR the agent hit an authentication gate during automation.
+
+**Use ONLY for:**
+- **Authentication gates** - the agent tried CLI/API but needs credentials (this is NOT a failure)
+- Email verification links (clicking email)
+- SMS 2FA codes (phone verification)
+- Manual account approvals (platform requires human review)
+- Credit card 3D Secure flows (web-based payment authorization)
+- OAuth app approvals (web-based approval)
+
+**Do NOT use for pre-planned manual work:**
+- Deploying (use CLI - auth gate if needed)
+- Creating webhooks/databases (use API/CLI - auth gate if needed)
+- Running builds/tests (use Bash tool)
+- Creating files (use Write tool)
+
+**Structure:**
+```xml
+<task type="checkpoint:human-action" gate="blocking">
+  <action>[What human must do - the agent already did everything automatable]</action>
+  <instructions>
+    [What the agent already automated]
+    [The ONE thing requiring human action]
+  </instructions>
+  <verification>[What the agent can check afterward]</verification>
+  <resume-signal>[How to continue]</resume-signal>
+</task>
+```
+
+**Example: Email Verification**
+```xml
+<task type="auto">
+  <name>Create SendGrid account via API</name>
+  <action>Use SendGrid API to create subuser account with provided email. Request verification email.</action>
+  <verify>API returns 201, account created</verify>
+  <done>Account created, verification email sent</done>
+</task>
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Complete email verification for SendGrid account</action>
+  <instructions>
+    I created the account and requested verification email.
+    Check your inbox for SendGrid verification link and click it.
+  </instructions>
+  <verification>SendGrid API key works: curl test succeeds</verification>
+  <resume-signal>Type "done" when email verified</resume-signal>
+</task>
+```
+
+**Example: Authentication Gate (Dynamic Checkpoint)**
+```xml
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <files>.vercel/, vercel.json</files>
+  <action>Run `vercel --yes` to deploy</action>
+  <verify>vercel ls shows deployment, fetch returns 200</verify>
+</task>
+
+<!-- If vercel returns "Error: Not authenticated", the agent creates checkpoint on the fly -->
+
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Authenticate Vercel CLI so I can continue deployment</action>
+  <instructions>
+    I tried to deploy but got authentication error.
+    Run: vercel login
+    This will open your browser - complete the authentication flow.
+  </instructions>
+  <verification>vercel whoami returns your account email</verification>
+  <resume-signal>Type "done" when authenticated</resume-signal>
+</task>
+
+<!-- After authentication, the agent retries the deployment -->
+
+<task type="auto">
+  <name>Retry Vercel deployment</name>
+  <action>Run `vercel --yes` (now authenticated)</action>
+  <verify>vercel ls shows deployment, fetch returns 200</verify>
+</task>
+```
+
+**Key distinction:** Auth gates are created dynamically when the agent encounters auth errors. NOT pre-planned - the agent automates first, asks for credentials only when blocked.
+</type>
+</checkpoint_types>
+
+<execution_protocol>
+
+When the agent encounters `type="checkpoint:*"`:
+
+1. **Stop immediately** - do not proceed to next task
+2. **Display checkpoint clearly** using the format below
+3. **Wait for user response** - do not hallucinate completion
+4. **Verify if possible** - check files, run tests, whatever is specified
+5. **Resume execution** - continue to next task only after confirmation
+
+**For checkpoint:human-verify:**
+```
+╔═══════════════════════════════════════════════════════╗
+║  CHECKPOINT: Verification Required                    ║
+╚═══════════════════════════════════════════════════════╝
+
+Progress: 5/8 tasks complete
+Task: Responsive dashboard layout
+
+Built: Responsive dashboard at /dashboard
+
+How to verify:
+  1. Visit: http://localhost:3000/dashboard
+  2. Desktop (>1024px): Sidebar visible, content fills remaining space
+  3. Tablet (768px): Sidebar collapses to icons
+  4. Mobile (375px): Sidebar hidden, hamburger menu appears
+
+────────────────────────────────────────────────────────
+→ YOUR ACTION: Type "approved" or describe issues
+────────────────────────────────────────────────────────
+```
+
+**For checkpoint:decision:**
+```
+╔═══════════════════════════════════════════════════════╗
+║  CHECKPOINT: Decision Required                        ║
+╚═══════════════════════════════════════════════════════╝
+
+Progress: 2/6 tasks complete
+Task: Select authentication provider
+
+Decision: Which auth provider should we use?
+
+Context: Need user authentication. Three options with different tradeoffs.
+
+Options:
+  1. supabase - Built-in with our DB, free tier
+     Pros: Row-level security integration, generous free tier
+     Cons: Less customizable UI, ecosystem lock-in
+
+  2. clerk - Best DX, paid after 10k users
+     Pros: Beautiful pre-built UI, excellent documentation
+     Cons: Vendor lock-in, pricing at scale
+
+  3. nextauth - Self-hosted, maximum control
+     Pros: Free, no vendor lock-in, widely adopted
+     Cons: More setup work, DIY security updates
+
+────────────────────────────────────────────────────────
+→ YOUR ACTION: Select supabase, clerk, or nextauth
+────────────────────────────────────────────────────────
+```
+
+**For checkpoint:human-action:**
+```
+╔═══════════════════════════════════════════════════════╗
+║  CHECKPOINT: Action Required                          ║
+╚═══════════════════════════════════════════════════════╝
+
+Progress: 3/8 tasks complete
+Task: Deploy to Vercel
+
+Attempted: vercel --yes
+Error: Not authenticated. Please run 'vercel login'
+
+What you need to do:
+  1. Run: vercel login
+  2. Complete browser authentication when it opens
+  3. Return here when done
+
+I'll verify: vercel whoami returns your account
+
+────────────────────────────────────────────────────────
+→ YOUR ACTION: Type "done" when authenticated
+────────────────────────────────────────────────────────
+```
+</execution_protocol>
+
+<authentication_gates>
+
+**Auth gate = the agent tried CLI/API, got auth error.** Not a failure - a gate requiring human input to unblock.
+
+**Pattern:** the agent tries automation → auth error → creates checkpoint:human-action → user authenticates → the agent retries → continues
+
+**Gate protocol:**
+1. Recognize it's not a failure - missing auth is expected
+2. Stop current task - don't retry repeatedly
+3. Create checkpoint:human-action dynamically
+4. Provide exact authentication steps
+5. Verify authentication works
+6. Retry the original task
+7. Continue normally
+
+**Key distinction:**
+- Pre-planned checkpoint: "I need you to do X" (wrong - the agent should automate)
+- Auth gate: "I tried to automate X but need credentials" (correct - unblocks automation)
+
+</authentication_gates>
+
+<automation_reference>
+
+**The rule:** If it has CLI/API, the agent does it. Never ask human to perform automatable work.
+
+## Service CLI Reference
+
+| Service     | CLI/API        | Key Commands                              | Auth Gate            |
+| ----------- | -------------- | ----------------------------------------- | -------------------- |
+| Vercel      | `vercel`       | `--yes`, `env add`, `--prod`, `ls`        | `vercel login`       |
+| Railway     | `railway`      | `init`, `up`, `variables set`             | `railway login`      |
+| Fly         | `fly`          | `launch`, `deploy`, `secrets set`         | `fly auth login`     |
+| Stripe      | `stripe` + API | `listen`, `trigger`, API calls            | API key in .env      |
+| Supabase    | `supabase`     | `init`, `link`, `db push`, `gen types`    | `supabase login`     |
+| Upstash     | `upstash`      | `redis create`, `redis get`               | `upstash auth login` |
+| PlanetScale | `pscale`       | `database create`, `branch create`        | `pscale auth login`  |
+| GitHub      | `gh`           | `repo create`, `pr create`, `secret set`  | `gh auth login`      |
+| Node        | `npm`/`pnpm`   | `install`, `run build`, `test`, `run dev` | N/A                  |
+| Xcode       | `xcodebuild`   | `-project`, `-scheme`, `build`, `test`    | N/A                  |
+| Convex      | `npx convex`   | `dev`, `deploy`, `env set`, `env get`     | `npx convex login`   |
+
+## Environment Variable Automation
+
+**Env files:** Use Write/Edit tools. Never ask human to create .env manually.
+
+**Dashboard env vars via CLI:**
+
+| Platform | CLI Command             | Example                                    |
+| -------- | ----------------------- | ------------------------------------------ |
+| Convex   | `npx convex env set`    | `npx convex env set OPENAI_API_KEY sk-...` |
+| Vercel   | `vercel env add`        | `vercel env add STRIPE_KEY production`     |
+| Railway  | `railway variables set` | `railway variables set API_KEY=value`      |
+| Fly      | `fly secrets set`       | `fly secrets set DATABASE_URL=...`         |
+| Supabase | `supabase secrets set`  | `supabase secrets set MY_SECRET=value`     |
+
+**Secret collection pattern:**
+```xml
+<!-- WRONG: Asking user to add env vars in dashboard -->
+<task type="checkpoint:human-action">
+  <action>Add OPENAI_API_KEY to Convex dashboard</action>
+  <instructions>Go to dashboard.convex.dev → Settings → Environment Variables → Add</instructions>
+</task>
+
+<!-- RIGHT: the agent asks for value, then adds via CLI -->
+<task type="checkpoint:human-action">
+  <action>Provide your OpenAI API key</action>
+  <instructions>
+    I need your OpenAI API key for Convex backend.
+    Get it from: https://platform.openai.com/api-keys
+    Paste the key (starts with sk-)
+  </instructions>
+  <verification>I'll add it via `npx convex env set` and verify</verification>
+  <resume-signal>Paste your API key</resume-signal>
+</task>
+
+<task type="auto">
+  <name>Configure OpenAI key in Convex</name>
+  <action>Run `npx convex env set OPENAI_API_KEY {user-provided-key}`</action>
+  <verify>`npx convex env get OPENAI_API_KEY` returns the key (masked)</verify>
+</task>
+```
+
+## Dev Server Automation
+
+| Framework | Start Command                | Ready Signal                   | Default URL           |
+| --------- | ---------------------------- | ------------------------------ | --------------------- |
+| Next.js   | `npm run dev`                | "Ready in" or "started server" | http://localhost:3000 |
+| Vite      | `npm run dev`                | "ready in"                     | http://localhost:5173 |
+| Convex    | `npx convex dev`             | "Convex functions ready"       | N/A (backend only)    |
+| Express   | `npm start`                  | "listening on port"            | http://localhost:3000 |
+| Django    | `python manage.py runserver` | "Starting development server"  | http://localhost:8000 |
+
+**Server lifecycle:**
+```bash
+# Run in background, capture PID
+npm run dev &
+DEV_SERVER_PID=$!
+
+# Wait for ready (max 30s) - uses fetch() for cross-platform compatibility
+timeout 30 bash -c 'until node -e "fetch(\"http://localhost:3000\").then(r=>{process.exit(r.ok?0:1)}).catch(()=>process.exit(1))" 2>/dev/null; do sleep 1; done'
+```
+
+**Port conflicts:** Kill stale process (`lsof -ti:3000 | xargs kill`) or use alternate port (`--port 3001`).
+
+**Server stays running** through checkpoints. Only kill when plan complete, switching to production, or port needed for different service.
+
+## CLI Installation Handling
+
+| CLI           | Auto-install? | Command                                               |
+| ------------- | ------------- | ----------------------------------------------------- |
+| npm/pnpm/yarn | No - ask user | User chooses package manager                          |
+| vercel        | Yes           | `npm i -g vercel`                                     |
+| gh (GitHub)   | Yes           | `brew install gh` (macOS) or `apt install gh` (Linux) |
+| stripe        | Yes           | `npm i -g stripe`                                     |
+| supabase      | Yes           | `npm i -g supabase`                                   |
+| convex        | No - use npx  | `npx convex` (no install needed)                      |
+| fly           | Yes           | `brew install flyctl` or curl installer               |
+| railway       | Yes           | `npm i -g @railway/cli`                               |
+
+**Protocol:** Try command → "command not found" → auto-installable? → yes: install silently, retry → no: checkpoint asking user to install.
+
+## Pre-Checkpoint Automation Failures
+
+| Failure            | Response                                                    |
+| ------------------ | ----------------------------------------------------------- |
+| Server won't start | Check error, fix issue, retry (don't proceed to checkpoint) |
+| Port in use        | Kill stale process or use alternate port                    |
+| Missing dependency | Run `npm install`, retry                                    |
+| Build error        | Fix the error first (bug, not checkpoint issue)             |
+| Auth error         | Create auth gate checkpoint                                 |
+| Network timeout    | Retry with backoff, then checkpoint if persistent           |
+
+**Never present a checkpoint with broken verification environment.** If the local server isn't responding, don't ask user to "visit localhost:3000".
+
+> **Cross-platform note:** Use `node -e "fetch('http://localhost:3000').then(r=>console.log(r.status))"` instead of `curl` for health checks. `curl` is broken on Windows MSYS/Git Bash due to SSL/path mangling issues.
+
+```xml
+<!-- WRONG: Checkpoint with broken environment -->
+<task type="checkpoint:human-verify">
+  <what-built>Dashboard (server failed to start)</what-built>
+  <how-to-verify>Visit http://localhost:3000...</how-to-verify>
+</task>
+
+<!-- RIGHT: Fix first, then checkpoint -->
+<task type="auto">
+  <name>Fix server startup issue</name>
+  <action>Investigate error, fix root cause, restart server</action>
+  <verify>fetch http://localhost:3000 returns 200</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Dashboard - server running at http://localhost:3000</what-built>
+  <how-to-verify>Visit http://localhost:3000/dashboard...</how-to-verify>
+</task>
+```
+
+## Automatable Quick Reference
+
+| Action                           | Automatable?               | the agent does it? |
+| -------------------------------- | -------------------------- | ------------------ |
+| Deploy to Vercel                 | Yes (`vercel`)             | YES                |
+| Create Stripe webhook            | Yes (API)                  | YES                |
+| Write .env file                  | Yes (Write tool)           | YES                |
+| Create Upstash DB                | Yes (`upstash`)            | YES                |
+| Run tests                        | Yes (`npm test`)           | YES                |
+| Start dev server                 | Yes (`npm run dev`)        | YES                |
+| Add env vars to Convex           | Yes (`npx convex env set`) | YES                |
+| Add env vars to Vercel           | Yes (`vercel env add`)     | YES                |
+| Seed database                    | Yes (CLI/API)              | YES                |
+| Click email verification link    | No                         | NO                 |
+| Enter credit card with 3DS       | No                         | NO                 |
+| Complete OAuth in browser        | No                         | NO                 |
+| Visually verify UI looks correct | No                         | NO                 |
+| Test interactive user flows      | No                         | NO                 |
+
+</automation_reference>
+
+<writing_guidelines>
+
+**DO:**
+- Automate everything with CLI/API before checkpoint
+- Be specific: "Visit https://myapp.vercel.app" not "check deployment"
+- Number verification steps
+- State expected outcomes: "You should see X"
+- Provide context: why this checkpoint exists
+
+**DON'T:**
+- Ask human to do work the agent can automate ❌
+- Assume knowledge: "Configure the usual settings" ❌
+- Skip steps: "Set up database" (too vague) ❌
+- Mix multiple verifications in one checkpoint ❌
+
+**Placement:**
+- **After automation completes** - not before the agent does the work
+- **After UI buildout** - before declaring phase complete
+- **Before dependent work** - decisions before implementation
+- **At integration points** - after configuring external services
+
+**Bad placement:** Before automation ❌ | Too frequent ❌ | Too late (dependent tasks already needed the result) ❌
+</writing_guidelines>
+
+<examples>
+
+### Example 1: Database Setup (No Checkpoint Needed)
+
+```xml
+<task type="auto">
+  <name>Create Upstash Redis database</name>
+  <files>.env</files>
+  <action>
+    1. Run `upstash redis create myapp-cache --region us-east-1`
+    2. Capture connection URL from output
+    3. Write to .env: UPSTASH_REDIS_URL={url}
+    4. Verify connection with test command
+  </action>
+  <verify>
+    - upstash redis list shows database
+    - .env contains UPSTASH_REDIS_URL
+    - Test connection succeeds
+  </verify>
+  <done>Redis database created and configured</done>
+</task>
+
+<!-- NO CHECKPOINT NEEDED - the agent automated everything and verified programmatically -->
+```
+
+### Example 2: Full Auth Flow (Single checkpoint at end)
+
+```xml
+<task type="auto">
+  <name>Create user schema</name>
+  <files>src/db/schema.ts</files>
+  <action>Define User, Session, Account tables with Drizzle ORM</action>
+  <verify>npm run db:generate succeeds</verify>
+</task>
+
+<task type="auto">
+  <name>Create auth API routes</name>
+  <files>src/app/api/auth/[...nextauth]/route.ts</files>
+  <action>Set up NextAuth with GitHub provider, JWT strategy</action>
+  <verify>TypeScript compiles, no errors</verify>
+</task>
+
+<task type="auto">
+  <name>Create login UI</name>
+  <files>src/app/login/page.tsx, src/components/LoginButton.tsx</files>
+  <action>Create login page with GitHub OAuth button</action>
+  <verify>npm run build succeeds</verify>
+</task>
+
+<task type="auto">
+  <name>Start dev server for auth testing</name>
+  <action>Run `npm run dev` in background, wait for ready signal</action>
+  <verify>fetch http://localhost:3000 returns 200</verify>
+  <done>Dev server running at http://localhost:3000</done>
+</task>
+
+<!-- ONE checkpoint at end verifies the complete flow -->
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Complete authentication flow - dev server running at http://localhost:3000</what-built>
+  <how-to-verify>
+    1. Visit: http://localhost:3000/login
+    2. Click "Sign in with GitHub"
+    3. Complete GitHub OAuth flow
+    4. Verify: Redirected to /dashboard, user name displayed
+    5. Refresh page: Session persists
+    6. Click logout: Session cleared
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe issues</resume-signal>
+</task>
+```
+</examples>
+
+<anti_patterns>
+
+### ❌ BAD: Asking user to start dev server
+
+```xml
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Dashboard component</what-built>
+  <how-to-verify>
+    1. Run: npm run dev
+    2. Visit: http://localhost:3000/dashboard
+    3. Check layout is correct
+  </how-to-verify>
+</task>
+```
+
+**Why bad:** the agent can run `npm run dev`. User should only visit URLs, not execute commands.
+
+### ✅ GOOD: the agent starts server, user visits
+
+```xml
+<task type="auto">
+  <name>Start dev server</name>
+  <action>Run `npm run dev` in background</action>
+  <verify>fetch http://localhost:3000 returns 200</verify>
+</task>
+
+<task type="checkpoint:human-verify" gate="blocking">
+  <what-built>Dashboard at http://localhost:3000/dashboard (server running)</what-built>
+  <how-to-verify>
+    Visit http://localhost:3000/dashboard and verify:
+    1. Layout matches design
+    2. No console errors
+  </how-to-verify>
+</task>
+```
+
+### ❌ BAD: Asking human to deploy / ✅ GOOD: the agent automates
+
+```xml
+<!-- BAD: Asking user to deploy via dashboard -->
+<task type="checkpoint:human-action" gate="blocking">
+  <action>Deploy to Vercel</action>
+  <instructions>Visit vercel.com/new → Import repo → Click Deploy → Copy URL</instructions>
+</task>
+
+<!-- GOOD: the agent deploys, user verifies -->
+<task type="auto">
+  <name>Deploy to Vercel</name>
+  <action>Run `vercel --yes`. Capture URL.</action>
+  <verify>vercel ls shows deployment, fetch returns 200</verify>
+</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Deployed to {url}</what-built>
+  <how-to-verify>Visit {url}, check homepage loads</how-to-verify>
+  <resume-signal>Type "approved"</resume-signal>
+</task>
+```
+
+### ❌ BAD: Too many checkpoints / ✅ GOOD: Single checkpoint
+
+```xml
+<!-- BAD: Checkpoint after every task -->
+<task type="auto">Create schema</task>
+<task type="checkpoint:human-verify">Check schema</task>
+<task type="auto">Create API route</task>
+<task type="checkpoint:human-verify">Check API</task>
+<task type="auto">Create UI form</task>
+<task type="checkpoint:human-verify">Check form</task>
+
+<!-- GOOD: One checkpoint at end -->
+<task type="auto">Create schema</task>
+<task type="auto">Create API route</task>
+<task type="auto">Create UI form</task>
+
+<task type="checkpoint:human-verify">
+  <what-built>Complete auth flow (schema + API + UI)</what-built>
+  <how-to-verify>Test full flow: register, login, access protected page</how-to-verify>
+  <resume-signal>Type "approved"</resume-signal>
+</task>
+```
+
+### ❌ BAD: Vague verification / ✅ GOOD: Specific steps
+
+```xml
+<!-- BAD -->
+<task type="checkpoint:human-verify">
+  <what-built>Dashboard</what-built>
+  <how-to-verify>Check it works</how-to-verify>
+</task>
+
+<!-- GOOD -->
+<task type="checkpoint:human-verify">
+  <what-built>Responsive dashboard - server running at http://localhost:3000</what-built>
+  <how-to-verify>
+    Visit http://localhost:3000/dashboard and verify:
+    1. Desktop (>1024px): Sidebar visible, content area fills remaining space
+    2. Tablet (768px): Sidebar collapses to icons
+    3. Mobile (375px): Sidebar hidden, hamburger menu in header
+    4. No horizontal scroll at any size
+  </how-to-verify>
+  <resume-signal>Type "approved" or describe layout issues</resume-signal>
+</task>
+```
+
+### ❌ BAD: Asking user to run CLI commands
+
+```xml
+<task type="checkpoint:human-action">
+  <action>Run database migrations</action>
+  <instructions>Run: npx prisma migrate deploy && npx prisma db seed</instructions>
+</task>
+```
+
+**Why bad:** the agent can run these commands. User should never execute CLI commands.
+
+### ❌ BAD: Asking user to copy values between services
+
+```xml
+<task type="checkpoint:human-action">
+  <action>Configure webhook URL in Stripe</action>
+  <instructions>Copy deployment URL → Stripe Dashboard → Webhooks → Add endpoint → Copy secret → Add to .env</instructions>
+</task>
+```
+
+**Why bad:** Stripe has an API. the agent should create the webhook via API and write to .env directly.
+
+</anti_patterns>
+
+<summary>
+
+Checkpoints formalize human-in-the-loop points for verification and decisions, not manual work.
+
+**The golden rule:** If the agent CAN automate it, the agent MUST automate it.
+
+**Checkpoint priority:**
+1. **checkpoint:human-verify** (90%) - the agent automated everything, human confirms visual/functional correctness
+2. **checkpoint:decision** (9%) - Human makes architectural/technology choices
+3. **checkpoint:human-action** (1%) - Truly unavoidable manual steps with no API/CLI
+
+**When NOT to use checkpoints:**
+- Things the agent can verify programmatically (tests, builds)
+- File operations (the agent can read files)
+- Code correctness (tests and static analysis)
+- Anything automatable via CLI/API
+</summary>
--- a/.pi/gsd/references/continuation-format.md
+++ b/.pi/gsd/references/continuation-format.md
@@ -0,0 +1,249 @@
+# Continuation Format
+
+Standard format for presenting next steps after completing a command or workflow.
+
+## Core Structure
+
+```
+---
+
+## ▶ Next Up
+
+**{identifier}: {name}** - {one-line description}
+
+`{command to copy-paste}`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+
+**Also available:**
+- `{alternative option 1}` - description
+- `{alternative option 2}` - description
+
+---
+```
+
+## Format Rules
+
+1. **Always show what it is** - name + description, never just a command path
+2. **Pull context from source** - ROADMAP.md for phases, PLAN.md `<objective>` for plans
+3. **Command in inline code** - backticks, easy to copy-paste, renders as clickable link
+4. **`/new` explanation** - always include, keeps it concise but explains why
+5. **"Also available" not "Other options"** - sounds more app-like
+6. **Visual separators** - `---` above and below to make it stand out
+
+## Variants
+
+### Execute Next Plan
+
+```
+---
+
+## ▶ Next Up
+
+**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry
+
+`/gsd-execute-phase 2`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+
+**Also available:**
+- Review plan before executing
+- `/gsd-list-phase-assumptions 2` - check assumptions
+
+---
+```
+
+### Execute Final Plan in Phase
+
+Add note that this is the last plan and what comes after:
+
+```
+---
+
+## ▶ Next Up
+
+**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry
+<sub>Final plan in Phase 2</sub>
+
+`/gsd-execute-phase 2`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+
+**After this completes:**
+- Phase 2 → Phase 3 transition
+- Next: **Phase 3: Core Features** - User dashboard and settings
+
+---
+```
+
+### Plan a Phase
+
+```
+---
+
+## ▶ Next Up
+
+**Phase 2: Authentication** - JWT login flow with refresh tokens
+
+`/gsd-plan-phase 2`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+
+**Also available:**
+- `/gsd-discuss-phase 2` - gather context first
+- `/gsd-research-phase 2` - investigate unknowns
+- Review roadmap
+
+---
+```
+
+### Phase Complete, Ready for Next
+
+Show completion status before next action:
+
+```
+---
+
+## ✓ Phase 2 Complete
+
+3/3 plans executed
+
+## ▶ Next Up
+
+**Phase 3: Core Features** - User dashboard, settings, and data export
+
+`/gsd-plan-phase 3`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+
+**Also available:**
+- `/gsd-discuss-phase 3` - gather context first
+- `/gsd-research-phase 3` - investigate unknowns
+- Review what Phase 2 built
+
+---
+```
+
+### Multiple Equal Options
+
+When there's no clear primary action:
+
+```
+---
+
+## ▶ Next Up
+
+**Phase 3: Core Features** - User dashboard, settings, and data export
+
+**To plan directly:** `/gsd-plan-phase 3`
+
+**To discuss context first:** `/gsd-discuss-phase 3`
+
+**To research unknowns:** `/gsd-research-phase 3`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+```
+
+### Milestone Complete
+
+```
+---
+
+## 🎉 Milestone v1.0 Complete
+
+All 4 phases shipped
+
+## ▶ Next Up
+
+**Start v1.1** - questioning → research → requirements → roadmap
+
+`/gsd-new-milestone`
+
+<sub>`/new` first → fresh context window</sub>
+
+---
+```
+
+## Pulling Context
+
+### For phases (from ROADMAP.md):
+
+```markdown
+### Phase 2: Authentication
+**Goal**: JWT login flow with refresh tokens
+```
+
+Extract: `**Phase 2: Authentication** - JWT login flow with refresh tokens`
+
+### For plans (from ROADMAP.md):
+
+```markdown
+Plans:
+- [ ] 02-03: Add refresh token rotation
+```
+
+Or from PLAN.md `<objective>`:
+
+```xml
+<objective>
+Add refresh token rotation with sliding expiry window.
+
+Purpose: Extend session lifetime without compromising security.
+</objective>
+```
+
+Extract: `**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry`
+
+## Anti-Patterns
+
+### Don't: Command-only (no context)
+
+```
+## To Continue
+
+Run `/new`, then paste:
+/gsd-execute-phase 2
+```
+
+User has no idea what 02-03 is about.
+
+### Don't: Missing /new explanation
+
+```
+`/gsd-plan-phase 3`
+
+Run /new first.
+```
+
+Doesn't explain why. User might skip it.
+
+### Don't: "Other options" language
+
+```
+Other options:
+- Review roadmap
+```
+
+Sounds like an afterthought. Use "Also available:" instead.
+
+### Don't: Fenced code blocks for commands
+
+```
+```
+/gsd-plan-phase 3
+```
+```
+
+Fenced blocks inside templates create nesting ambiguity. Use inline backticks instead.
--- a/.pi/gsd/references/decimal-phase-calculation.md
+++ b/.pi/gsd/references/decimal-phase-calculation.md
@@ -0,0 +1,64 @@
+# Decimal Phase Calculation
+
+Calculate the next decimal phase number for urgent insertions.
+
+## Using gsd-tools
+
+```bash
+# Get next decimal phase after phase 6
+node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal 6
+```
+
+Output:
+```json
+{
+  "found": true,
+  "base_phase": "06",
+  "next": "06.1",
+  "existing": []
+}
+```
+
+With existing decimals:
+```json
+{
+  "found": true,
+  "base_phase": "06",
+  "next": "06.3",
+  "existing": ["06.1", "06.2"]
+}
+```
+
+## Extract Values
+
+```bash
+DECIMAL_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --pick next)
+BASE_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --pick base_phase)
+```
+
+Or with --raw flag:
+```bash
+DECIMAL_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --raw)
+# Returns just: 06.1
+```
+
+## Examples
+
+| Existing Phases | Next Phase |
+|-----------------|------------|
+| 06 only | 06.1 |
+| 06, 06.1 | 06.2 |
+| 06, 06.1, 06.2 | 06.3 |
+| 06, 06.1, 06.3 (gap) | 06.4 |
+
+## Directory Naming
+
+Decimal phase directories use the full decimal number:
+
+```bash
+SLUG=$(node ".pi/gsd/bin/gsd-tools.cjs" generate-slug "$DESCRIPTION" --raw)
+PHASE_DIR=".planning/phases/${DECIMAL_PHASE}-${SLUG}"
+mkdir -p "$PHASE_DIR"
+```
+
+Example: `.planning/phases/06.1-fix-critical-auth-bug/`
--- a/.pi/gsd/references/git-integration.md
+++ b/.pi/gsd/references/git-integration.md
@@ -0,0 +1,295 @@
+<overview>
+Git integration for GSD framework.
+</overview>
+
+<core_principle>
+
+**Commit outcomes, not process.**
+
+The git log should read like a changelog of what shipped, not a diary of planning activity.
+</core_principle>
+
+<commit_points>
+
+| Event                   | Commit? | Why                                         |
+| ----------------------- | ------- | ------------------------------------------- |
+| BRIEF + ROADMAP created | YES     | Project initialization                      |
+| PLAN.md created         | NO      | Intermediate - commit with plan completion  |
+| RESEARCH.md created     | NO      | Intermediate                                |
+| DISCOVERY.md created    | NO      | Intermediate                                |
+| **Task completed**      | YES     | Atomic unit of work (1 commit per task)     |
+| **Plan completed**      | YES     | Metadata commit (SUMMARY + STATE + ROADMAP) |
+| Handoff created         | YES     | WIP state preserved                         |
+
+</commit_points>
+
+<git_check>
+
+```bash
+[ -d .git ] && echo "GIT_EXISTS" || echo "NO_GIT"
+```
+
+If NO_GIT: Run `git init` silently. GSD projects always get their own repo.
+</git_check>
+
+<commit_formats>
+
+<format name="initialization">
+## Project Initialization (brief + roadmap together)
+
+```
+docs: initialize [project-name] ([N] phases)
+
+[One-liner from PROJECT.md]
+
+Phases:
+1. [phase-name]: [goal]
+2. [phase-name]: [goal]
+3. [phase-name]: [goal]
+```
+
+What to commit:
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: initialize [project-name] ([N] phases)" --files .planning/
+```
+
+</format>
+
+<format name="task-completion">
+## Task Completion (During Plan Execution)
+
+Each task gets its own commit immediately after completion.
+
+> **Parallel agents:** When running as a parallel executor (spawned by execute-phase),
+> use `--no-verify` on all commits to avoid pre-commit hook lock contention.
+> The orchestrator validates hooks once after all agents complete.
+
+```
+{type}({phase}-{plan}): {task-name}
+
+- [Key change 1]
+- [Key change 2]
+- [Key change 3]
+```
+
+**Commit types:**
+- `feat` - New feature/functionality
+- `fix` - Bug fix
+- `test` - Test-only (TDD RED phase)
+- `refactor` - Code cleanup (TDD REFACTOR phase)
+- `perf` - Performance improvement
+- `chore` - Dependencies, config, tooling
+
+**Examples:**
+
+```bash
+# Standard task
+git add src/api/auth.ts src/types/user.ts
+git commit -m "feat(08-02): create user registration endpoint
+
+- POST /auth/register validates email and password
+- Checks for duplicate users
+- Returns JWT token on success
+"
+
+# TDD task - RED phase
+git add src/__tests__/jwt.test.ts
+git commit -m "test(07-02): add failing test for JWT generation
+
+- Tests token contains user ID claim
+- Tests token expires in 1 hour
+- Tests signature verification
+"
+
+# TDD task - GREEN phase
+git add src/utils/jwt.ts
+git commit -m "feat(07-02): implement JWT generation
+
+- Uses jose library for signing
+- Includes user ID and expiry claims
+- Signs with HS256 algorithm
+"
+```
+
+</format>
+
+<format name="plan-completion">
+## Plan Completion (After All Tasks Done)
+
+After all tasks committed, one final metadata commit captures plan completion.
+
+```
+docs({phase}-{plan}): complete [plan-name] plan
+
+Tasks completed: [N]/[N]
+- [Task 1 name]
+- [Task 2 name]
+- [Task 3 name]
+
+SUMMARY: .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md
+```
+
+What to commit:
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "docs({phase}-{plan}): complete [plan-name] plan" --files .planning/phases/XX-name/{phase}-{plan}-PLAN.md .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md .planning/STATE.md .planning/ROADMAP.md
+```
+
+**Note:** Code files NOT included - already committed per-task.
+
+</format>
+
+<format name="handoff">
+## Handoff (WIP)
+
+```
+wip: [phase-name] paused at task [X]/[Y]
+
+Current: [task name]
+[If blocked:] Blocked: [reason]
+```
+
+What to commit:
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "wip: [phase-name] paused at task [X]/[Y]" --files .planning/
+```
+
+</format>
+</commit_formats>
+
+<example_log>
+
+**Old approach (per-plan commits):**
+```
+a7f2d1 feat(checkout): Stripe payments with webhook verification
+3e9c4b feat(products): catalog with search, filters, and pagination
+8a1b2c feat(auth): JWT with refresh rotation using jose
+5c3d7e feat(foundation): Next.js 15 + Prisma + Tailwind scaffold
+2f4a8d docs: initialize ecommerce-app (5 phases)
+```
+
+**New approach (per-task commits):**
+```
+# Phase 04 - Checkout
+1a2b3c docs(04-01): complete checkout flow plan
+4d5e6f feat(04-01): add webhook signature verification
+7g8h9i feat(04-01): implement payment session creation
+0j1k2l feat(04-01): create checkout page component
+
+# Phase 03 - Products
+3m4n5o docs(03-02): complete product listing plan
+6p7q8r feat(03-02): add pagination controls
+9s0t1u feat(03-02): implement search and filters
+2v3w4x feat(03-01): create product catalog schema
+
+# Phase 02 - Auth
+5y6z7a docs(02-02): complete token refresh plan
+8b9c0d feat(02-02): implement refresh token rotation
+1e2f3g test(02-02): add failing test for token refresh
+4h5i6j docs(02-01): complete JWT setup plan
+7k8l9m feat(02-01): add JWT generation and validation
+0n1o2p chore(02-01): install jose library
+
+# Phase 01 - Foundation
+3q4r5s docs(01-01): complete scaffold plan
+6t7u8v feat(01-01): configure Tailwind and globals
+9w0x1y feat(01-01): set up Prisma with database
+2z3a4b feat(01-01): create Next.js 15 project
+
+# Initialization
+5c6d7e docs: initialize ecommerce-app (5 phases)
+```
+
+Each plan produces 2-4 commits (tasks + metadata). Clear, granular, bisectable.
+
+</example_log>
+
+<anti_patterns>
+
+**Still don't commit (intermediate artifacts):**
+- PLAN.md creation (commit with plan completion)
+- RESEARCH.md (intermediate)
+- DISCOVERY.md (intermediate)
+- Minor planning tweaks
+- "Fixed typo in roadmap"
+
+**Do commit (outcomes):**
+- Each task completion (feat/fix/test/refactor)
+- Plan completion metadata (docs)
+- Project initialization (docs)
+
+**Key principle:** Commit working code and shipped outcomes, not planning process.
+
+</anti_patterns>
+
+<commit_strategy_rationale>
+
+## Why Per-Task Commits?
+
+**Context engineering for AI:**
+- Git history becomes primary context source for future the agent sessions
+- `git log --grep="{phase}-{plan}"` shows all work for a plan
+- `git diff <hash>^..<hash>` shows exact changes per task
+- Less reliance on parsing SUMMARY.md = more context for actual work
+
+**Failure recovery:**
+- Task 1 committed ✅, Task 2 failed ❌
+- the agent in next session: sees task 1 complete, can retry task 2
+- Can `git reset --hard` to last successful task
+
+**Debugging:**
+- `git bisect` finds exact failing task, not just failing plan
+- `git blame` traces line to specific task context
+- Each commit is independently revertable
+
+**Observability:**
+- Solo developer + the agent workflow benefits from granular attribution
+- Atomic commits are git best practice
+- "Commit noise" irrelevant when consumer is the agent, not humans
+
+</commit_strategy_rationale>
+
+<sub_repos_support>
+
+## Multi-Repo Workspace Support (sub_repos)
+
+For workspaces with separate git repos (e.g., `backend/`, `frontend/`, `shared/`), GSD routes commits to each repo independently.
+
+### Configuration
+
+In `.planning/config.json`, list sub-repo directories under `planning.sub_repos`:
+
+```json
+{
+  "planning": {
+    "commit_docs": false,
+    "sub_repos": ["backend", "frontend", "shared"]
+  }
+}
+```
+
+Set `commit_docs: false` so planning docs stay local and are not committed to any sub-repo.
+
+### How It Works
+
+1. **Auto-detection:** During `/gsd-new-project`, directories with their own `.git` folder are detected and offered for selection as sub-repos. On subsequent runs, `loadConfig` auto-syncs the `sub_repos` list with the filesystem - adding newly created repos and removing deleted ones. This means `config.json` may be rewritten automatically when repos change on disk.
+2. **File grouping:** Code files are grouped by their sub-repo prefix (e.g., `backend/src/api/users.ts` belongs to the `backend/` repo).
+3. **Independent commits:** Each sub-repo receives its own atomic commit via `gsd-tools.cjs commit-to-subrepo`. File paths are made relative to the sub-repo root before staging.
+4. **Planning stays local:** The `.planning/` directory is not committed; it acts as cross-repo coordination.
+
+### Commit Routing
+
+Instead of the standard `commit` command, use `commit-to-subrepo` when `sub_repos` is configured:
+
+```bash
+node .pi/gsd/bin/gsd-tools.cjs commit-to-subrepo "feat(02-01): add user API" \
+  --files backend/src/api/users.ts backend/src/types/user.ts frontend/src/components/UserForm.tsx
+```
+
+This stages `src/api/users.ts` and `src/types/user.ts` in the `backend/` repo, and `src/components/UserForm.tsx` in the `frontend/` repo, then commits each independently with the same message.
+
+Files that don't match any configured sub-repo are reported as unmatched.
+
+</sub_repos_support>
--- a/.pi/gsd/references/git-planning-commit.md
+++ b/.pi/gsd/references/git-planning-commit.md
@@ -0,0 +1,38 @@
+# Git Planning Commit
+
+Commit planning artifacts using the gsd-tools CLI, which automatically checks `commit_docs` config and gitignore status.
+
+## Commit via CLI
+
+Always use `gsd-tools.cjs commit` for `.planning/` files - it handles `commit_docs` and gitignore checks automatically:
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "docs({scope}): {description}" --files .planning/STATE.md .planning/ROADMAP.md
+```
+
+The CLI will return `skipped` (with reason) if `commit_docs` is `false` or `.planning/` is gitignored. No manual conditional checks needed.
+
+## Amend previous commit
+
+To fold `.planning/` file changes into the previous commit:
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "" --files .planning/codebase/*.md --amend
+```
+
+## Commit Message Patterns
+
+| Command       | Scope     | Example                                         |
+| ------------- | --------- | ----------------------------------------------- |
+| plan-phase    | phase     | `docs(phase-03): create authentication plans`   |
+| execute-phase | phase     | `docs(phase-03): complete authentication phase` |
+| new-milestone | milestone | `docs: start milestone v1.1`                    |
+| remove-phase  | chore     | `chore: remove phase 17 (dashboard)`            |
+| insert-phase  | phase     | `docs: insert phase 16.1 (critical fix)`        |
+| add-phase     | phase     | `docs: add phase 07 (settings page)`            |
+
+## When to Skip
+
+- `commit_docs: false` in config
+- `.planning/` is gitignored
+- No changes to commit (check with `git status --porcelain .planning/`)
--- a/.pi/gsd/references/model-profile-resolution.md
+++ b/.pi/gsd/references/model-profile-resolution.md
@@ -0,0 +1,36 @@
+# Model Profile Resolution
+
+Resolve model profile once at the start of orchestration, then use it for all Task spawns.
+
+## Resolution Pattern
+
+```bash
+MODEL_PROFILE=$(cat .planning/config.json 2>/dev/null | grep -o '"model_profile"[[:space:]]*:[[:space:]]*"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"' || echo "balanced")
+```
+
+Default: `balanced` if not set or config missing.
+
+## Lookup Table
+
+@.pi/gsd/references/model-profiles.md
+
+Look up the agent in the table for the resolved profile. Pass the model parameter to Task calls:
+
+```
+Task(
+  prompt="...",
+  subagent_type="gsd-planner",
+  model="{resolved_model}"  # "inherit", "sonnet", or "haiku"
+)
+```
+
+**Note:** Opus-tier agents resolve to `"inherit"` (not `"opus"`). This causes the agent to use the parent session's model, avoiding conflicts with organization policies that may block specific opus versions.
+
+If `model_profile` is `"inherit"`, all agents resolve to `"inherit"` (useful for OpenCode `/model`).
+
+## Usage
+
+1. Resolve once at orchestration start
+2. Store the profile value
+3. Look up each agent's model from the table when spawning
+4. Pass model parameter to each Task call (values: `"inherit"`, `"sonnet"`, `"haiku"`)
--- a/.pi/gsd/references/model-profiles.md
+++ b/.pi/gsd/references/model-profiles.md
@@ -0,0 +1,146 @@
+<!-- AUTO-GENERATED - do not edit by hand.
+     Source of truth: get-shit-done/bin/lib/model-profiles.cjs
+     Regenerate with: node get-shit-done/bin/gsd-tools.cjs generate-model-profiles-md --harness agent
+-->
+# Model Profiles
+
+Model profiles control which Claude model each GSD agent uses. This allows balancing quality vs token spend, or inheriting the currently selected session model.
+
+## Profile Definitions
+
+| Agent | `quality` | `balanced` | `budget` | `inherit` |
+|-------|-----------|------------|----------|-----------|
+| gsd-planner | opus | opus | sonnet | inherit |
+| gsd-roadmapper | opus | sonnet | sonnet | inherit |
+| gsd-executor | opus | sonnet | sonnet | inherit |
+| gsd-phase-researcher | opus | sonnet | haiku | inherit |
+| gsd-project-researcher | opus | sonnet | haiku | inherit |
+| gsd-research-synthesizer | sonnet | sonnet | haiku | inherit |
+| gsd-debugger | opus | sonnet | sonnet | inherit |
+| gsd-codebase-mapper | sonnet | haiku | haiku | inherit |
+| gsd-verifier | sonnet | sonnet | haiku | inherit |
+| gsd-plan-checker | sonnet | sonnet | haiku | inherit |
+| gsd-integration-checker | sonnet | sonnet | haiku | inherit |
+| gsd-nyquist-auditor | sonnet | sonnet | haiku | inherit |
+| gsd-ui-researcher | opus | sonnet | haiku | inherit |
+| gsd-ui-checker | sonnet | sonnet | haiku | inherit |
+| gsd-ui-auditor | sonnet | sonnet | haiku | inherit |
+
+## Profile Philosophy
+
+**quality** - Maximum reasoning power
+- Opus for all decision-making agents
+- Sonnet for read-only verification
+- Use when: quota available, critical architecture work
+
+**balanced** (default) - Smart allocation
+- Opus only for planning (where architecture decisions happen)
+- Sonnet for execution and research (follows explicit instructions)
+- Sonnet for verification (needs reasoning, not just pattern matching)
+- Use when: normal development, good balance of quality and cost
+
+**budget** - Minimal Opus usage
+- Sonnet for anything that writes code
+- Haiku for research and verification
+- Use when: conserving quota, high-volume work, less critical phases
+
+**inherit** - Follow the current session model
+- All agents resolve to `inherit`
+- Best when you switch models interactively (for example OpenCode `/model`)
+- **Required when using non-Anthropic providers** (OpenRouter, local models, etc.) - otherwise GSD may call Anthropic models directly, incurring unexpected costs
+- Use when: you want GSD to follow your currently selected runtime model
+
+## Using Non-Claude Runtimes (Codex, OpenCode, Gemini CLI)
+
+When installed for a non-Claude runtime, the GSD installer sets `resolve_model_ids: "omit"` in `~/.gsd/defaults.json`. This returns an empty model parameter for all agents, so each agent uses the runtime's default model. No manual setup is needed.
+
+To assign different models to different agents, add `model_overrides` with model IDs your runtime recognizes:
+
+```json
+{
+  "resolve_model_ids": "omit",
+  "model_overrides": {
+    "gsd-planner": "o3",
+    "gsd-executor": "o4-mini",
+    "gsd-debugger": "o3",
+    "gsd-codebase-mapper": "o4-mini"
+  }
+}
+```
+
+The same tiering logic applies: stronger models for planning and debugging, cheaper models for execution and mapping.
+
+## Using Claude Code with Non-Anthropic Providers (OpenRouter, Local)
+
+If you're using Claude Code with OpenRouter, a local model, or any non-Anthropic provider, set the `inherit` profile to prevent GSD from calling Anthropic models for subagents:
+
+```bash
+# Via settings command
+/gsd-settings
+# → Select "Inherit" for model profile
+
+# Or manually in .planning/config.json
+{
+  "model_profile": "inherit"
+}
+```
+
+Without `inherit`, GSD's default `balanced` profile spawns specific Anthropic models (`opus`, `sonnet`, `haiku`) for each agent type, which can result in additional API costs through your non-Anthropic provider.
+
+## Resolution Logic
+
+Orchestrators resolve model before spawning:
+
+```
+1. Read .planning/config.json
+2. Check model_overrides for agent-specific override
+3. If no override, look up agent in profile table
+4. Pass model parameter to Task call
+```
+
+## Per-Agent Overrides
+
+Override specific agents without changing the entire profile:
+
+```json
+{
+  "model_profile": "balanced",
+  "model_overrides": {
+    "gsd-executor": "opus",
+    "gsd-planner": "haiku"
+  }
+}
+```
+
+Overrides take precedence over the profile. Valid values: `opus`, `sonnet`, `haiku`, `inherit`, or any fully-qualified model ID (e.g., `"o3"`, `"openai/o3"`, `"google/gemini-2.5-pro"`).
+
+## Switching Profiles
+
+Runtime: `/gsd-set-profile <profile>`
+
+Per-project default: Set in `.planning/config.json`:
+```json
+{
+  "model_profile": "balanced"
+}
+```
+
+## Design Rationale
+
+**Why Opus for gsd-planner?**
+Planning involves architecture decisions, goal decomposition, and task design. This is where model quality has the highest impact.
+
+**Why Sonnet for gsd-executor?**
+Executors follow explicit PLAN.md instructions. The plan already contains the reasoning; execution is implementation.
+
+**Why Sonnet (not Haiku) for verifiers in balanced?**
+Verification requires goal-backward reasoning - checking if code *delivers* what the phase promised, not just pattern matching. Sonnet handles this well; Haiku may miss subtle gaps.
+
+**Why Haiku for gsd-codebase-mapper?**
+Read-only exploration and pattern extraction. No reasoning required, just structured output from file contents.
+
+**Why `inherit` instead of passing `opus` directly?**
+Claude Code's `"opus"` alias maps to a specific model version. Organizations may block older opus versions while allowing newer ones. GSD returns `"inherit"` for opus-tier agents, causing them to use whatever opus version the user has configured in their session. This avoids version conflicts and silent fallbacks to Sonnet.
+
+**Why `inherit` profile?**
+Some runtimes (including OpenCode) let users switch models at runtime (`/model`). The `inherit` profile keeps all GSD subagents aligned to that live selection.
--- a/.pi/gsd/references/phase-argument-parsing.md
+++ b/.pi/gsd/references/phase-argument-parsing.md
@@ -0,0 +1,61 @@
+# Phase Argument Parsing
+
+Parse and normalize phase arguments for commands that operate on phases.
+
+## Extraction
+
+From `$ARGUMENTS`:
+- Extract phase number (first numeric argument)
+- Extract flags (prefixed with `--`)
+- Remaining text is description (for insert/add commands)
+
+## Using gsd-tools
+
+The `find-phase` command handles normalization and validation in one step:
+
+```bash
+PHASE_INFO=$(node ".pi/gsd/bin/gsd-tools.cjs" find-phase "${PHASE}")
+```
+
+Returns JSON with:
+- `found`: true/false
+- `directory`: Full path to phase directory
+- `phase_number`: Normalized number (e.g., "06", "06.1")
+- `phase_name`: Name portion (e.g., "foundation")
+- `plans`: Array of PLAN.md files
+- `summaries`: Array of SUMMARY.md files
+
+## Manual Normalization (Legacy)
+
+Zero-pad integer phases to 2 digits. Preserve decimal suffixes.
+
+```bash
+# Normalize phase number
+if [[ "$PHASE" =~ ^[0-9]+$ ]]; then
+  # Integer: 8 → 08
+  PHASE=$(printf "%02d" "$PHASE")
+elif [[ "$PHASE" =~ ^([0-9]+)\.([0-9]+)$ ]]; then
+  # Decimal: 2.1 → 02.1
+  PHASE=$(printf "%02d.%s" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}")
+fi
+```
+
+## Validation
+
+Use `roadmap get-phase` to validate phase exists:
+
+```bash
+PHASE_CHECK=$(node ".pi/gsd/bin/gsd-tools.cjs" roadmap get-phase "${PHASE}" --pick found)
+if [ "$PHASE_CHECK" = "false" ]; then
+  echo "ERROR: Phase ${PHASE} not found in roadmap"
+  exit 1
+fi
+```
+
+## Directory Lookup
+
+Use `find-phase` for directory lookup:
+
+```bash
+PHASE_DIR=$(node ".pi/gsd/bin/gsd-tools.cjs" find-phase "${PHASE}" --raw)
+```
--- a/.pi/gsd/references/planning-config.md
+++ b/.pi/gsd/references/planning-config.md
@@ -0,0 +1,202 @@
+<planning_config>
+
+Configuration options for `.planning/` directory behavior.
+
+<config_schema>
+```json
+"planning": {
+  "commit_docs": true,
+  "search_gitignored": false
+},
+"git": {
+  "branching_strategy": "none",
+  "phase_branch_template": "gsd/phase-{phase}-{slug}",
+  "milestone_branch_template": "gsd/{milestone}-{slug}",
+  "quick_branch_template": null
+}
+```
+
+| Option                          | Default                      | Description                                                   |
+| ------------------------------- | ---------------------------- | ------------------------------------------------------------- |
+| `commit_docs`                   | `true`                       | Whether to commit planning artifacts to git                   |
+| `search_gitignored`             | `false`                      | Add `--no-ignore` to broad rg searches                        |
+| `git.branching_strategy`        | `"none"`                     | Git branching approach: `"none"`, `"phase"`, or `"milestone"` |
+| `git.phase_branch_template`     | `"gsd/phase-{phase}-{slug}"` | Branch template for phase strategy                            |
+| `git.milestone_branch_template` | `"gsd/{milestone}-{slug}"`   | Branch template for milestone strategy                        |
+| `git.quick_branch_template`     | `null`                       | Optional branch template for quick-task runs                  |
+</config_schema>
+
+<commit_docs_behavior>
+
+**When `commit_docs: true` (default):**
+- Planning files committed normally
+- SUMMARY.md, STATE.md, ROADMAP.md tracked in git
+- Full history of planning decisions preserved
+
+**When `commit_docs: false`:**
+- Skip all `git add`/`git commit` for `.planning/` files
+- User must add `.planning/` to `.gitignore`
+- Useful for: OSS contributions, client projects, keeping planning private
+
+**Using gsd-tools.cjs (preferred):**
+
+```bash
+# Commit with automatic commit_docs + gitignore checks:
+node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: update state" --files .planning/STATE.md
+
+# Load config via state load (returns JSON):
+INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" state load)
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+# commit_docs is available in the JSON output
+
+# Or use init commands which include commit_docs:
+INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" init execute-phase "1")
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+# commit_docs is included in all init command outputs
+```
+
+**Auto-detection:** If `.planning/` is gitignored, `commit_docs` is automatically `false` regardless of config.json. This prevents git errors when users have `.planning/` in `.gitignore`.
+
+**Commit via CLI (handles checks automatically):**
+
+```bash
+node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: update state" --files .planning/STATE.md
+```
+
+The CLI checks `commit_docs` config and gitignore status internally - no manual conditionals needed.
+
+</commit_docs_behavior>
+
+<search_behavior>
+
+**When `search_gitignored: false` (default):**
+- Standard rg behavior (respects .gitignore)
+- Direct path searches work: `rg "pattern" .planning/` finds files
+- Broad searches skip gitignored: `rg "pattern"` skips `.planning/`
+
+**When `search_gitignored: true`:**
+- Add `--no-ignore` to broad rg searches that should include `.planning/`
+- Only needed when searching entire repo and expecting `.planning/` matches
+
+**Note:** Most GSD operations use direct file reads or explicit paths, which work regardless of gitignore status.
+
+</search_behavior>
+
+<setup_uncommitted_mode>
+
+To use uncommitted mode:
+
+1. **Set config:**
+   ```json
+   "planning": {
+     "commit_docs": false,
+     "search_gitignored": true
+   }
+   ```
+
+2. **Add to .gitignore:**
+   ```
+   .planning/
+   ```
+
+3. **Existing tracked files:** If `.planning/` was previously tracked:
+   ```bash
+   git rm -r --cached .planning/
+   git commit -m "chore: stop tracking planning docs"
+   ```
+
+4. **Branch merges:** When using `branching_strategy: phase` or `milestone`, the `complete-milestone` workflow automatically strips `.planning/` files from staging before merge commits when `commit_docs: false`.
+
+</setup_uncommitted_mode>
+
+<branching_strategy_behavior>
+
+**Branching Strategies:**
+
+| Strategy    | When branch created                   | Branch scope     | Merge point             |
+| ----------- | ------------------------------------- | ---------------- | ----------------------- |
+| `none`      | Never                                 | N/A              | N/A                     |
+| `phase`     | At `execute-phase` start              | Single phase     | User merges after phase |
+| `milestone` | At first `execute-phase` of milestone | Entire milestone | At `complete-milestone` |
+
+**When `git.branching_strategy: "none"` (default):**
+- All work commits to current branch
+- Standard GSD behavior
+
+**When `git.branching_strategy: "phase"`:**
+- `execute-phase` creates/switches to a branch before execution
+- Branch name from `phase_branch_template` (e.g., `gsd/phase-03-authentication`)
+- All plan commits go to that branch
+- User merges branches manually after phase completion
+- `complete-milestone` offers to merge all phase branches
+
+**When `git.branching_strategy: "milestone"`:**
+- First `execute-phase` of milestone creates the milestone branch
+- Branch name from `milestone_branch_template` (e.g., `gsd/v1.0-mvp`)
+- All phases in milestone commit to same branch
+- `complete-milestone` offers to merge milestone branch to main
+
+**Template variables:**
+
+| Variable      | Available in              | Description                           |
+| ------------- | ------------------------- | ------------------------------------- |
+| `{phase}`     | phase_branch_template     | Zero-padded phase number (e.g., "03") |
+| `{slug}`      | Both                      | Lowercase, hyphenated name            |
+| `{milestone}` | milestone_branch_template | Milestone version (e.g., "v1.0")      |
+
+**Checking the config:**
+
+Use `init execute-phase` which returns all config as JSON:
+```bash
+INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" init execute-phase "1")
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+# JSON output includes: branching_strategy, phase_branch_template, milestone_branch_template
+```
+
+Or use `state load` for the config values:
+```bash
+INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" state load)
+if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
+# Parse branching_strategy, phase_branch_template, milestone_branch_template from JSON
+```
+
+**Branch creation:**
+
+```bash
+# For phase strategy
+if [ "$BRANCHING_STRATEGY" = "phase" ]; then
+  PHASE_SLUG=$(echo "$PHASE_NAME" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//')
+  BRANCH_NAME=$(echo "$PHASE_BRANCH_TEMPLATE" | sed "s/{phase}/$PADDED_PHASE/g" | sed "s/{slug}/$PHASE_SLUG/g")
+  git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
+fi
+
+# For milestone strategy
+if [ "$BRANCHING_STRATEGY" = "milestone" ]; then
+  MILESTONE_SLUG=$(echo "$MILESTONE_NAME" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//')
+  BRANCH_NAME=$(echo "$MILESTONE_BRANCH_TEMPLATE" | sed "s/{milestone}/$MILESTONE_VERSION/g" | sed "s/{slug}/$MILESTONE_SLUG/g")
+  git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
+fi
+```
+
+**Merge options at complete-milestone:**
+
+| Option                     | Git command          | Result                           |
+| -------------------------- | -------------------- | -------------------------------- |
+| Squash merge (recommended) | `git merge --squash` | Single clean commit per branch   |
+| Merge with history         | `git merge --no-ff`  | Preserves all individual commits |
+| Delete without merging     | `git branch -D`      | Discard branch work              |
+| Keep branches              | (none)               | Manual handling later            |
+
+Squash merge is recommended - keeps main branch history clean while preserving the full development history in the branch (until deleted).
+
+**Use cases:**
+
+| Strategy    | Best for                                                     |
+| ----------- | ------------------------------------------------------------ |
+| `none`      | Solo development, simple projects                            |
+| `phase`     | Code review per phase, granular rollback, team collaboration |
+| `milestone` | Release branches, staging environments, PR per version       |
+
+</branching_strategy_behavior>
+
+</planning_config>
--- a/.pi/gsd/references/questioning.md
+++ b/.pi/gsd/references/questioning.md
@@ -0,0 +1,162 @@
+<questioning_guide>
+
+Project initialization is dream extraction, not requirements gathering. You're helping the user discover and articulate what they want to build. This isn't a contract negotiation - it's collaborative thinking.
+
+<philosophy>
+
+**You are a thinking partner, not an interviewer.**
+
+The user often has a fuzzy idea. Your job is to help them sharpen it. Ask questions that make them think "oh, I hadn't considered that" or "yes, that's exactly what I mean."
+
+Don't interrogate. Collaborate. Don't follow a script. Follow the thread.
+
+</philosophy>
+
+<the_goal>
+
+By the end of questioning, you need enough clarity to write a PROJECT.md that downstream phases can act on:
+
+- **Research** needs: what domain to research, what the user already knows, what unknowns exist
+- **Requirements** needs: clear enough vision to scope v1 features
+- **Roadmap** needs: clear enough vision to decompose into phases, what "done" looks like
+- **plan-phase** needs: specific requirements to break into tasks, context for implementation choices
+- **execute-phase** needs: success criteria to verify against, the "why" behind requirements
+
+A vague PROJECT.md forces every downstream phase to guess. The cost compounds.
+
+</the_goal>
+
+<how_to_question>
+
+**Start open.** Let them dump their mental model. Don't interrupt with structure.
+
+**Follow energy.** Whatever they emphasized, dig into that. What excited them? What problem sparked this?
+
+**Challenge vagueness.** Never accept fuzzy answers. "Good" means what? "Users" means who? "Simple" means how?
+
+**Make the abstract concrete.** "Walk me through using this." "What does that actually look like?"
+
+**Clarify ambiguity.** "When you say Z, do you mean A or B?" "You mentioned X - tell me more."
+
+**Know when to stop.** When you understand what they want, why they want it, who it's for, and what done looks like - offer to proceed.
+
+</how_to_question>
+
+<question_types>
+
+Use these as inspiration, not a checklist. Pick what's relevant to the thread.
+
+**Motivation - why this exists:**
+- "What prompted this?"
+- "What are you doing today that this replaces?"
+- "What would you do if this existed?"
+
+**Concreteness - what it actually is:**
+- "Walk me through using this"
+- "You said X - what does that actually look like?"
+- "Give me an example"
+
+**Clarification - what they mean:**
+- "When you say Z, do you mean A or B?"
+- "You mentioned X - tell me more about that"
+
+**Success - how you'll know it's working:**
+- "How will you know this is working?"
+- "What does done look like?"
+
+</question_types>
+
+<using_askuserquestion>
+
+Use AskUserQuestion to help users think by presenting concrete options to react to.
+
+**Good options:**
+- Interpretations of what they might mean
+- Specific examples to confirm or deny
+- Concrete choices that reveal priorities
+
+**Bad options:**
+- Generic categories ("Technical", "Business", "Other")
+- Leading options that presume an answer
+- Too many options (2-4 is ideal)
+- Headers longer than 12 characters (hard limit - validation will reject them)
+
+**Example - vague answer:**
+User says "it should be fast"
+
+- header: "Fast"
+- question: "Fast how?"
+- options: ["Sub-second response", "Handles large datasets", "Quick to build", "Let me explain"]
+
+**Example - following a thread:**
+User mentions "frustrated with current tools"
+
+- header: "Frustration"
+- question: "What specifically frustrates you?"
+- options: ["Too many clicks", "Missing features", "Unreliable", "Let me explain"]
+
+**Tip for users - modifying an option:**
+Users who want a slightly modified version of an option can select "Other" and reference the option by number: `#1 but for finger joints only` or `#2 with pagination disabled`. This avoids retyping the full option text.
+
+</using_askuserquestion>
+
+<freeform_rule>
+
+**When the user wants to explain freely, STOP using AskUserQuestion.**
+
+If a user selects "Other" and their response signals they want to describe something in their own words (e.g., "let me describe it", "I'll explain", "something else", or any open-ended reply that isn't choosing/modifying an existing option), you MUST:
+
+1. **Ask your follow-up as plain text** - NOT via AskUserQuestion
+2. **Wait for them to type at the normal prompt**
+3. **Resume AskUserQuestion** only after processing their freeform response
+
+The same applies if YOU include a freeform-indicating option (like "Let me explain" or "Describe in detail") and the user selects it.
+
+**Wrong:** User says "let me describe it" → AskUserQuestion("What feature?", ["Feature A", "Feature B", "Describe in detail"])
+**Right:** User says "let me describe it" → "Go ahead - what are you thinking?"
+
+</freeform_rule>
+
+<context_checklist>
+
+Use this as a **background checklist**, not a conversation structure. Check these mentally as you go. If gaps remain, weave questions naturally.
+
+- [ ] What they're building (concrete enough to explain to a stranger)
+- [ ] Why it needs to exist (the problem or desire driving it)
+- [ ] Who it's for (even if just themselves)
+- [ ] What "done" looks like (observable outcomes)
+
+Four things. If they volunteer more, capture it.
+
+</context_checklist>
+
+<decision_gate>
+
+When you could write a clear PROJECT.md, offer to proceed:
+
+- header: "Ready?"
+- question: "I think I understand what you're after. Ready to create PROJECT.md?"
+- options:
+  - "Create PROJECT.md" - Let's move forward
+  - "Keep exploring" - I want to share more / ask me more
+
+If "Keep exploring" - ask what they want to add or identify gaps and probe naturally.
+
+Loop until "Create PROJECT.md" selected.
+
+</decision_gate>
+
+<anti_patterns>
+
+- **Checklist walking** - Going through domains regardless of what they said
+- **Canned questions** - "What's your core value?" "What's out of scope?" regardless of context
+- **Corporate speak** - "What are your success criteria?" "Who are your stakeholders?"
+- **Interrogation** - Firing questions without building on answers
+- **Rushing** - Minimizing questions to get to "the work"
+- **Shallow acceptance** - Taking vague answers without probing
+- **Premature constraints** - Asking about tech stack before understanding the idea
+- **User skills** - NEVER ask about user's technical experience. the agent builds.
+
+</anti_patterns>
+
+</questioning_guide>
--- a/.pi/gsd/references/tdd.md
+++ b/.pi/gsd/references/tdd.md
@@ -0,0 +1,263 @@
+<overview>
+TDD is about design quality, not coverage metrics. The red-green-refactor cycle forces you to think about behavior before implementation, producing cleaner interfaces and more testable code.
+
+**Principle:** If you can describe the behavior as `expect(fn(input)).toBe(output)` before writing `fn`, TDD improves the result.
+
+**Key insight:** TDD work is fundamentally heavier than standard tasks-it requires 2-3 execution cycles (RED → GREEN → REFACTOR), each with file reads, test runs, and potential debugging. TDD features get dedicated plans to ensure full context is available throughout the cycle.
+</overview>
+
+<when_to_use_tdd>
+## When TDD Improves Quality
+
+**TDD candidates (create a TDD plan):**
+- Business logic with defined inputs/outputs
+- API endpoints with request/response contracts
+- Data transformations, parsing, formatting
+- Validation rules and constraints
+- Algorithms with testable behavior
+- State machines and workflows
+- Utility functions with clear specifications
+
+**Skip TDD (use standard plan with `type="auto"` tasks):**
+- UI layout, styling, visual components
+- Configuration changes
+- Glue code connecting existing components
+- One-off scripts and migrations
+- Simple CRUD with no business logic
+- Exploratory prototyping
+
+**Heuristic:** Can you write `expect(fn(input)).toBe(output)` before writing `fn`?
+→ Yes: Create a TDD plan
+→ No: Use standard plan, add tests after if needed
+</when_to_use_tdd>
+
+<tdd_plan_structure>
+## TDD Plan Structure
+
+Each TDD plan implements **one feature** through the full RED-GREEN-REFACTOR cycle.
+
+```markdown
+---
+phase: XX-name
+plan: NN
+type: tdd
+---
+
+<objective>
+[What feature and why]
+Purpose: [Design benefit of TDD for this feature]
+Output: [Working, tested feature]
+</objective>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@relevant/source/files.ts
+</context>
+
+<feature>
+  <name>[Feature name]</name>
+  <files>[source file, test file]</files>
+  <behavior>
+    [Expected behavior in testable terms]
+    Cases: input → expected output
+  </behavior>
+  <implementation>[How to implement once tests pass]</implementation>
+</feature>
+
+<verification>
+[Test command that proves feature works]
+</verification>
+
+<success_criteria>
+- Failing test written and committed
+- Implementation passes test
+- Refactor complete (if needed)
+- All 2-3 commits present
+</success_criteria>
+
+<output>
+After completion, create SUMMARY.md with:
+- RED: What test was written, why it failed
+- GREEN: What implementation made it pass
+- REFACTOR: What cleanup was done (if any)
+- Commits: List of commits produced
+</output>
+```
+
+**One feature per TDD plan.** If features are trivial enough to batch, they're trivial enough to skip TDD-use a standard plan and add tests after.
+</tdd_plan_structure>
+
+<execution_flow>
+## Red-Green-Refactor Cycle
+
+**RED - Write failing test:**
+1. Create test file following project conventions
+2. Write test describing expected behavior (from `<behavior>` element)
+3. Run test - it MUST fail
+4. If test passes: feature exists or test is wrong. Investigate.
+5. Commit: `test({phase}-{plan}): add failing test for [feature]`
+
+**GREEN - Implement to pass:**
+1. Write minimal code to make test pass
+2. No cleverness, no optimization - just make it work
+3. Run test - it MUST pass
+4. Commit: `feat({phase}-{plan}): implement [feature]`
+
+**REFACTOR (if needed):**
+1. Clean up implementation if obvious improvements exist
+2. Run tests - MUST still pass
+3. Only commit if changes made: `refactor({phase}-{plan}): clean up [feature]`
+
+**Result:** Each TDD plan produces 2-3 atomic commits.
+</execution_flow>
+
+<test_quality>
+## Good Tests vs Bad Tests
+
+**Test behavior, not implementation:**
+- Good: "returns formatted date string"
+- Bad: "calls formatDate helper with correct params"
+- Tests should survive refactors
+
+**One concept per test:**
+- Good: Separate tests for valid input, empty input, malformed input
+- Bad: Single test checking all edge cases with multiple assertions
+
+**Descriptive names:**
+- Good: "should reject empty email", "returns null for invalid ID"
+- Bad: "test1", "handles error", "works correctly"
+
+**No implementation details:**
+- Good: Test public API, observable behavior
+- Bad: Mock internals, test private methods, assert on internal state
+</test_quality>
+
+<framework_setup>
+## Test Framework Setup (If None Exists)
+
+When executing a TDD plan but no test framework is configured, set it up as part of the RED phase:
+
+**1. Detect project type:**
+```bash
+# JavaScript/TypeScript
+if [ -f package.json ]; then echo "node"; fi
+
+# Python
+if [ -f requirements.txt ] || [ -f pyproject.toml ]; then echo "python"; fi
+
+# Go
+if [ -f go.mod ]; then echo "go"; fi
+
+# Rust
+if [ -f Cargo.toml ]; then echo "rust"; fi
+```
+
+**2. Install minimal framework:**
+| Project        | Framework  | Install                                   |
+| -------------- | ---------- | ----------------------------------------- |
+| Node.js        | Jest       | `npm install -D jest @types/jest ts-jest` |
+| Node.js (Vite) | Vitest     | `npm install -D vitest`                   |
+| Python         | pytest     | `pip install pytest`                      |
+| Go             | testing    | Built-in                                  |
+| Rust           | cargo test | Built-in                                  |
+
+**3. Create config if needed:**
+- Jest: `jest.config.js` with ts-jest preset
+- Vitest: `vitest.config.ts` with test globals
+- pytest: `pytest.ini` or `pyproject.toml` section
+
+**4. Verify setup:**
+```bash
+# Run empty test suite - should pass with 0 tests
+npm test  # Node
+pytest    # Python
+go test ./...  # Go
+cargo test    # Rust
+```
+
+**5. Create first test file:**
+Follow project conventions for test location:
+- `*.test.ts` / `*.spec.ts` next to source
+- `__tests__/` directory
+- `tests/` directory at root
+
+Framework setup is a one-time cost included in the first TDD plan's RED phase.
+</framework_setup>
+
+<error_handling>
+## Error Handling
+
+**Test doesn't fail in RED phase:**
+- Feature may already exist - investigate
+- Test may be wrong (not testing what you think)
+- Fix before proceeding
+
+**Test doesn't pass in GREEN phase:**
+- Debug implementation
+- Don't skip to refactor
+- Keep iterating until green
+
+**Tests fail in REFACTOR phase:**
+- Undo refactor
+- Commit was premature
+- Refactor in smaller steps
+
+**Unrelated tests break:**
+- Stop and investigate
+- May indicate coupling issue
+- Fix before proceeding
+</error_handling>
+
+<commit_pattern>
+## Commit Pattern for TDD Plans
+
+TDD plans produce 2-3 atomic commits (one per phase):
+
+```
+test(08-02): add failing test for email validation
+
+- Tests valid email formats accepted
+- Tests invalid formats rejected
+- Tests empty input handling
+
+feat(08-02): implement email validation
+
+- Regex pattern matches RFC 5322
+- Returns boolean for validity
+- Handles edge cases (empty, null)
+
+refactor(08-02): extract regex to constant (optional)
+
+- Moved pattern to EMAIL_REGEX constant
+- No behavior changes
+- Tests still pass
+```
+
+**Comparison with standard plans:**
+- Standard plans: 1 commit per task, 2-4 commits per plan
+- TDD plans: 2-3 commits for single feature
+
+Both follow same format: `{type}({phase}-{plan}): {description}`
+
+**Benefits:**
+- Each commit independently revertable
+- Git bisect works at commit level
+- Clear history showing TDD discipline
+- Consistent with overall commit strategy
+</commit_pattern>
+
+<context_budget>
+## Context Budget
+
+TDD plans target **~40% context usage** (lower than standard plans' ~50%).
+
+Why lower:
+- RED phase: write test, run test, potentially debug why it didn't fail
+- GREEN phase: implement, run test, potentially iterate on failures
+- REFACTOR phase: modify code, run tests, verify no regressions
+
+Each phase involves reading files, running commands, analyzing output. The back-and-forth is inherently heavier than linear task execution.
+
+Single feature focus ensures full quality throughout the cycle.
+</context_budget>
--- a/.pi/gsd/references/ui-brand.md
+++ b/.pi/gsd/references/ui-brand.md
@@ -0,0 +1,165 @@
+<ui_patterns>
+
+Visual patterns for user-facing GSD output. Orchestrators @-reference this file.
+
+<core>
+<!-- Loaded by ALL commands via ui-brand-core.md -->
+
+## Stage Banners
+
+Use for major workflow transitions.
+
+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+ GSD ► {STAGE NAME}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+
+**Stage names (uppercase):**
+- `QUESTIONING`
+- `RESEARCHING`
+- `DEFINING REQUIREMENTS`
+- `CREATING ROADMAP`
+- `PLANNING PHASE {N}`
+- `EXECUTING WAVE {N}`
+- `VERIFYING`
+- `PHASE {N} COMPLETE ✓`
+- `MILESTONE COMPLETE 🎉`
+
+---
+
+## Status Symbols
+
+```
+✓  Complete / Passed / Verified
+✗  Failed / Missing / Blocked
+◆  In Progress
+○  Pending
+⚡ Auto-approved
+⚠  Warning
+🎉 Milestone complete (only in banner)
+```
+
+---
+
+## Progress Display
+
+**Phase/milestone level:**
+```
+Progress: ████████░░ 80%
+```
+
+**Task level:**
+```
+Tasks: 2/4 complete
+```
+
+**Plan level:**
+```
+Plans: 3/5 complete
+```
+
+---
+
+## Next Up Block
+
+Always at end of major completions.
+
+```
+───────────────────────────────────────────────────────────────
+
+## ▶ Next Up
+
+**{Identifier}: {Name}** - {one-line description}
+
+`{copy-paste command}`
+
+<sub>`/new` first → fresh context window</sub>
+
+───────────────────────────────────────────────────────────────
+
+**Also available:**
+- `/gsd-alternative-1` - description
+- `/gsd-alternative-2` - description
+
+───────────────────────────────────────────────────────────────
+```
+
+---
+
+## Tables
+
+```
+| Phase | Status | Plans | Progress |
+| ----- | ------ | ----- | -------- |
+| 1     | ✓      | 3/3   | 100%     |
+| 2     | ◆      | 1/4   | 25%      |
+| 3     | ○      | 0/2   | 0%       |
+```
+
+---
+
+## Anti-Patterns
+
+- Varying box/banner widths
+- Mixing banner styles (`===`, `---`, `***`)
+- Skipping `GSD ►` prefix in banners
+- Random emoji (`🚀`, `✨`, `💫`)
+- Missing Next Up block after completions
+
+</core>
+
+<!-- Execution-only: loaded by gsd-execute-phase, gsd-ui-phase, gsd-ui-review -->
+
+## Checkpoint Boxes
+
+User action required. 62-character width.
+
+```
+╔══════════════════════════════════════════════════════════════╗
+║  CHECKPOINT: {Type}                                          ║
+╚══════════════════════════════════════════════════════════════╝
+
+{Content}
+
+──────────────────────────────────────────────────────────────
+→ {ACTION PROMPT}
+──────────────────────────────────────────────────────────────
+```
+
+**Types:**
+- `CHECKPOINT: Verification Required` → `→ Type "approved" or describe issues`
+- `CHECKPOINT: Decision Required` → `→ Select: option-a / option-b`
+- `CHECKPOINT: Action Required` → `→ Type "done" when complete`
+
+---
+
+## Spawning Indicators
+
+```
+◆ Spawning researcher...
+
+◆ Spawning 4 researchers in parallel...
+  → Stack research
+  → Features research
+  → Architecture research
+  → Pitfalls research
+
+✓ Researcher complete: STACK.md written
+```
+
+---
+
+## Error Box
+
+```
+╔══════════════════════════════════════════════════════════════╗
+║  ERROR                                                       ║
+╚══════════════════════════════════════════════════════════════╝
+
+{Error description}
+
+**To fix:** {Resolution steps}
+```
+
+</ui_patterns>
--- a/.pi/gsd/references/user-profiling.md
+++ b/.pi/gsd/references/user-profiling.md
@@ -0,0 +1,681 @@
+# User Profiling: Detection Heuristics Reference
+
+This reference document defines detection heuristics for behavioral profiling across 8 dimensions. The gsd-user-profiler agent applies these rules when analyzing extracted session messages. Do not invent dimensions or scoring rules beyond what is defined here.
+
+## How to Use This Document
+
+1. The gsd-user-profiler agent reads this document before analyzing any messages
+2. For each dimension, the agent scans messages for the signal patterns defined below
+3. The agent applies the detection heuristics to classify the developer's pattern
+4. Confidence is scored using the thresholds defined per dimension
+5. Evidence quotes are curated using the rules in the Evidence Curation section
+6. Output must conform to the JSON schema in the Output Schema section
+
+---
+
+## Dimensions
+
+### 1. Communication Style
+
+`dimension_id: communication_style`
+
+**What we're measuring:** How the developer phrases requests, instructions, and feedback -- the structural pattern of their messages to the agent.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `terse-direct` | Short, imperative messages with minimal context. Gets to the point immediately. |
+| `conversational` | Medium-length messages mixing instructions with questions and thinking-aloud. Natural, informal tone. |
+| `detailed-structured` | Long messages with explicit structure -- headers, numbered lists, problem statements, pre-analysis. |
+| `mixed` | No dominant pattern; style shifts based on task type or project context. |
+
+**Signal patterns:**
+
+1. **Message length distribution** -- Average word count across messages. Terse < 50 words, conversational 50-200 words, detailed > 200 words.
+2. **Imperative-to-interrogative ratio** -- Ratio of commands ("fix this", "add X") to questions ("what do you think?", "should we?"). High imperative ratio suggests terse-direct.
+3. **Structural formatting** -- Presence of markdown headers, numbered lists, code blocks, or bullet points within messages. Frequent formatting suggests detailed-structured.
+4. **Context preambles** -- Whether the developer provides background/context before making a request. Preambles suggest conversational or detailed-structured.
+5. **Sentence completeness** -- Whether messages use full sentences or fragments/shorthand. Fragments suggest terse-direct.
+6. **Follow-up pattern** -- Whether the developer provides additional context in subsequent messages (multi-message requests suggest conversational).
+
+**Detection heuristics:**
+
+1. If average message length < 50 words AND predominantly imperative mood AND minimal formatting --> `terse-direct`
+2. If average message length 50-200 words AND mix of imperative and interrogative AND occasional formatting --> `conversational`
+3. If average message length > 200 words AND frequent structural formatting AND context preambles present --> `detailed-structured`
+4. If message length variance is high (std dev > 60% of mean) AND no single pattern dominates (< 60% of messages match one style) --> `mixed`
+5. If pattern varies systematically by project type (e.g., terse in CLI projects, detailed in frontend) --> `mixed` with context-dependent note
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ messages showing consistent pattern (> 70% match), same pattern observed across 2+ projects
+- **MEDIUM:** 5-9 messages showing pattern, OR pattern consistent within 1 project only
+- **LOW:** < 5 messages with relevant signals, OR mixed signals (contradictory patterns observed in similar contexts)
+- **UNSCORED:** 0 messages with relevant signals for this dimension
+
+**Example quotes:**
+
+- **terse-direct:** "fix the auth bug" / "add pagination to the list endpoint" / "this test is failing, make it pass"
+- **conversational:** "I'm thinking we should probably handle the error case here. What do you think about returning a 422 instead of a 500? The client needs to know it was a validation issue."
+- **detailed-structured:** "## Context\nThe auth flow currently uses session cookies but we need to migrate to JWT.\n\n## Requirements\n1. Access tokens (15min expiry)\n2. Refresh tokens (7-day)\n3. httpOnly cookies\n\n## What I've tried\nI looked at jose and jsonwebtoken..."
+
+**Context-dependent patterns:**
+
+When communication style varies systematically by project or task type, report the split rather than forcing a single rating. Example: "context-dependent: terse-direct for bug fixes and CLI tooling, detailed-structured for architecture and frontend work." Phase 3 orchestration resolves context-dependent splits by presenting the split to the user.
+
+---
+
+### 2. Decision Speed
+
+`dimension_id: decision_speed`
+
+**What we're measuring:** How quickly the developer makes choices when the agent presents options, alternatives, or trade-offs.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `fast-intuitive` | Decides immediately based on experience or gut feeling. Minimal deliberation. |
+| `deliberate-informed` | Requests comparison or summary before deciding. Wants to understand trade-offs. |
+| `research-first` | Delays decision to research independently. May leave and return with findings. |
+| `delegator` | Defers to the agent's recommendation. Trusts the suggestion. |
+
+**Signal patterns:**
+
+1. **Response latency to options** -- How many messages between the agent presenting options and developer choosing. Immediate (same message or next) suggests fast-intuitive.
+2. **Comparison requests** -- Presence of "compare these", "what are the trade-offs?", "pros and cons?" suggests deliberate-informed.
+3. **External research indicators** -- Messages like "I looked into X and...", "according to the docs...", "I read that..." suggest research-first.
+4. **Delegation language** -- "just pick one", "whatever you recommend", "your call", "go with the best option" suggests delegator.
+5. **Decision reversal frequency** -- How often the developer changes a decision after making it. Frequent reversals may indicate fast-intuitive with low confidence.
+
+**Detection heuristics:**
+
+1. If developer selects options within 1-2 messages of presentation AND uses decisive language ("use X", "go with A") AND rarely asks for comparisons --> `fast-intuitive`
+2. If developer requests trade-off analysis or comparison tables AND decides after receiving comparison AND asks clarifying questions --> `deliberate-informed`
+3. If developer defers decisions with "let me look into this" AND returns with external information AND cites documentation or articles --> `research-first`
+4. If developer uses delegation language (> 3 instances) AND rarely overrides the agent's choices AND says "sounds good" or "your call" --> `delegator`
+5. If no clear pattern OR evidence is split across multiple styles --> classify as the dominant style with a context-dependent note
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ decision points observed showing consistent pattern, same pattern across 2+ projects
+- **MEDIUM:** 5-9 decision points, OR consistent within 1 project only
+- **LOW:** < 5 decision points observed, OR mixed decision-making styles
+- **UNSCORED:** 0 messages containing decision-relevant signals
+
+**Example quotes:**
+
+- **fast-intuitive:** "Use Tailwind. Next question." / "Option B, let's move on"
+- **deliberate-informed:** "Can you compare Prisma vs Drizzle for this use case? I want to understand the migration story and type safety differences before I pick."
+- **research-first:** "Hold off on the DB choice -- I want to read the Drizzle docs and check their GitHub issues first. I'll come back with a decision."
+- **delegator:** "You know more about this than me. Whatever you recommend, go with it."
+
+**Context-dependent patterns:**
+
+Decision speed often varies by stakes. A developer may be fast-intuitive for styling choices but research-first for database or auth decisions. When this pattern is clear, report the split: "context-dependent: fast-intuitive for low-stakes (styling, naming), deliberate-informed for high-stakes (architecture, security)."
+
+---
+
+### 3. Explanation Depth
+
+`dimension_id: explanation_depth`
+
+**What we're measuring:** How much explanation the developer wants alongside code -- their preference for understanding vs. speed.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `code-only` | Wants working code with minimal or no explanation. Reads and understands code directly. |
+| `concise` | Wants brief explanation of approach with code. Key decisions noted, not exhaustive. |
+| `detailed` | Wants thorough walkthrough of the approach, reasoning, and code. Appreciates structure. |
+| `educational` | Wants deep conceptual explanation. Treats interactions as learning opportunities. |
+
+**Signal patterns:**
+
+1. **Explicit depth requests** -- "just show me the code", "explain why", "teach me about X", "skip the explanation"
+2. **Reaction to explanations** -- Does the developer skip past explanations? Ask for more detail? Say "too much"?
+3. **Follow-up question depth** -- Surface-level follow-ups ("does it work?") vs. conceptual ("why this pattern over X?")
+4. **Code comprehension signals** -- Does the developer reference implementation details in their messages? This suggests they read and understand code directly.
+5. **"I know this" signals** -- Messages like "I'm familiar with X", "skip the basics", "I know how hooks work" indicate lower explanation preference.
+
+**Detection heuristics:**
+
+1. If developer says "just the code" or "skip the explanation" AND rarely asks follow-up conceptual questions AND references code details directly --> `code-only`
+2. If developer accepts brief explanations without asking for more AND asks focused follow-ups about specific decisions --> `concise`
+3. If developer asks "why" questions AND requests walkthroughs AND appreciates structured explanations --> `detailed`
+4. If developer asks conceptual questions beyond the immediate task AND uses learning language ("I want to understand", "teach me") --> `educational`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ messages showing consistent preference, same preference across 2+ projects
+- **MEDIUM:** 5-9 messages, OR consistent within 1 project only
+- **LOW:** < 5 relevant messages, OR preferences shift between interactions
+- **UNSCORED:** 0 messages with relevant signals
+
+**Example quotes:**
+
+- **code-only:** "Just give me the implementation. I'll read through it." / "Skip the explanation, show the code."
+- **concise:** "Quick summary of the approach, then the code please." / "Why did you use a Map here instead of an object?"
+- **detailed:** "Walk me through this step by step. I want to understand the auth flow before we implement it."
+- **educational:** "Can you explain how JWT refresh token rotation works conceptually? I want to understand the security model, not just implement it."
+
+**Context-dependent patterns:**
+
+Explanation depth often correlates with domain familiarity. A developer may want code-only for well-known tech but educational for new domains. Report splits when observed: "context-dependent: code-only for React/TypeScript, detailed for database optimization."
+
+---
+
+### 4. Debugging Approach
+
+`dimension_id: debugging_approach`
+
+**What we're measuring:** How the developer approaches problems, errors, and unexpected behavior when working with the agent.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `fix-first` | Pastes error, wants it fixed. Minimal diagnosis interest. Results-oriented. |
+| `diagnostic` | Shares error with context, wants to understand the cause before fixing. |
+| `hypothesis-driven` | Investigates independently first, brings specific theories to the agent for validation. |
+| `collaborative` | Wants to work through the problem step-by-step with the agent as a partner. |
+
+**Signal patterns:**
+
+1. **Error presentation style** -- Raw error paste only (fix-first) vs. error + "I think it might be..." (hypothesis-driven) vs. "Can you help me understand why..." (diagnostic)
+2. **Pre-investigation indicators** -- Does the developer share what they already tried? Do they mention reading logs, checking state, or isolating the issue?
+3. **Root cause interest** -- After a fix, does the developer ask "why did that happen?" or just move on?
+4. **Step-by-step language** -- "Let's check X first", "what should we look at next?", "walk me through the debugging"
+5. **Fix acceptance pattern** -- Does the developer immediately apply fixes or question them first?
+
+**Detection heuristics:**
+
+1. If developer pastes errors without context AND accepts fixes without root cause questions AND moves on immediately --> `fix-first`
+2. If developer provides error context AND asks "why is this happening?" AND wants explanation with the fix --> `diagnostic`
+3. If developer shares their own analysis AND proposes theories ("I think the issue is X because...") AND asks the agent to confirm or refute --> `hypothesis-driven`
+4. If developer uses collaborative language ("let's", "what should we check?") AND prefers incremental diagnosis AND walks through problems together --> `collaborative`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ debugging interactions showing consistent approach, same approach across 2+ projects
+- **MEDIUM:** 5-9 debugging interactions, OR consistent within 1 project only
+- **LOW:** < 5 debugging interactions, OR approach varies significantly
+- **UNSCORED:** 0 messages with debugging-relevant signals
+
+**Example quotes:**
+
+- **fix-first:** "Getting this error: TypeError: Cannot read properties of undefined. Fix it."
+- **diagnostic:** "The API returns 500 when I send a POST to /users. Here's the request body and the server log. What's causing this?"
+- **hypothesis-driven:** "I think the race condition is in the useEffect cleanup. I checked and the subscription isn't being cancelled on unmount. Can you confirm?"
+- **collaborative:** "Let's debug this together. The test passes locally but fails in CI. What should we check first?"
+
+**Context-dependent patterns:**
+
+Debugging approach may vary by urgency. A developer might be fix-first under deadline pressure but hypothesis-driven during regular development. Note temporal patterns if detected.
+
+---
+
+### 5. UX Philosophy
+
+`dimension_id: ux_philosophy`
+
+**What we're measuring:** How the developer prioritizes user experience, design, and visual quality relative to functionality.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `function-first` | Get it working, polish later. Minimal UX concern during implementation. |
+| `pragmatic` | Basic usability from the start. Nothing ugly or broken, but no design obsession. |
+| `design-conscious` | Design and UX are treated as important as functionality. Attention to visual detail. |
+| `backend-focused` | Primarily builds backend/CLI. Minimal frontend exposure or interest. |
+
+**Signal patterns:**
+
+1. **Design-related requests** -- Mentions of styling, layout, responsiveness, animations, color schemes, spacing
+2. **Polish timing** -- Does the developer ask for visual polish during implementation or defer it?
+3. **UI feedback specificity** -- Vague ("make it look better") vs. specific ("increase the padding to 16px, change the font weight to 600")
+4. **Frontend vs. backend distribution** -- Ratio of frontend-focused requests to backend-focused requests
+5. **Accessibility mentions** -- References to a11y, screen readers, keyboard navigation, ARIA labels
+
+**Detection heuristics:**
+
+1. If developer rarely mentions UI/UX AND focuses on logic, APIs, data AND defers styling ("we'll make it pretty later") --> `function-first`
+2. If developer includes basic UX requirements AND mentions usability but not pixel-perfection AND balances form with function --> `pragmatic`
+3. If developer provides specific design requirements AND mentions polish, animations, spacing AND treats UI bugs as seriously as logic bugs --> `design-conscious`
+4. If developer works primarily on CLI tools, APIs, or backend systems AND rarely or never works on frontend AND messages focus on data, performance, infrastructure --> `backend-focused`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ messages with UX-relevant signals, same pattern across 2+ projects
+- **MEDIUM:** 5-9 messages, OR consistent within 1 project only
+- **LOW:** < 5 relevant messages, OR philosophy varies by project type
+- **UNSCORED:** 0 messages with UX-relevant signals
+
+**Example quotes:**
+
+- **function-first:** "Just get the form working. We'll style it later." / "I don't care how it looks, I need the data flowing."
+- **pragmatic:** "Make sure the loading state is visible and the error messages are clear. Standard styling is fine."
+- **design-conscious:** "The button needs more breathing room -- add 12px vertical padding and make the hover state transition 200ms. Also check the contrast ratio."
+- **backend-focused:** "I'm building a CLI tool. No UI needed." / "Add the REST endpoint, I'll handle the frontend separately."
+
+**Context-dependent patterns:**
+
+UX philosophy is inherently project-dependent. A developer building a CLI tool is necessarily backend-focused for that project. When possible, distinguish between project-driven and preference-driven patterns. If the developer only has backend projects, note that the rating reflects available data: "backend-focused (note: all analyzed projects are backend/CLI -- may not reflect frontend preferences)."
+
+---
+
+### 6. Vendor Philosophy
+
+`dimension_id: vendor_philosophy`
+
+**What we're measuring:** How the developer approaches choosing and evaluating libraries, frameworks, and external services.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `pragmatic-fast` | Uses what works, what the agent suggests, or what's fastest. Minimal evaluation. |
+| `conservative` | Prefers well-known, battle-tested, widely-adopted options. Risk-averse. |
+| `thorough-evaluator` | Researches alternatives, reads docs, compares features and trade-offs before committing. |
+| `opinionated` | Has strong, pre-existing preferences for specific tools. Knows what they like. |
+
+**Signal patterns:**
+
+1. **Library selection language** -- "just use whatever", "is X the standard?", "I want to compare A vs B", "we're using X, period"
+2. **Evaluation depth** -- Does the developer accept the first suggestion or ask for alternatives?
+3. **Stated preferences** -- Explicit mentions of preferred tools, past experience, or tool philosophy
+4. **Rejection patterns** -- Does the developer reject the agent's suggestions? On what basis (popularity, personal experience, docs quality)?
+5. **Dependency attitude** -- "minimize dependencies", "no external deps", "add whatever we need" -- reveals philosophy about external code
+
+**Detection heuristics:**
+
+1. If developer accepts library suggestions without pushback AND uses phrases like "sounds good" or "go with that" AND rarely asks about alternatives --> `pragmatic-fast`
+2. If developer asks about popularity, maintenance, community AND prefers "industry standard" or "battle-tested" AND avoids new/experimental --> `conservative`
+3. If developer requests comparisons AND reads docs before deciding AND asks about edge cases, license, bundle size --> `thorough-evaluator`
+4. If developer names specific libraries unprompted AND overrides the agent's suggestions AND expresses strong preferences --> `opinionated`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ vendor/library decisions observed, same pattern across 2+ projects
+- **MEDIUM:** 5-9 decisions, OR consistent within 1 project only
+- **LOW:** < 5 vendor decisions observed, OR pattern varies
+- **UNSCORED:** 0 messages with vendor-selection signals
+
+**Example quotes:**
+
+- **pragmatic-fast:** "Use whatever ORM you recommend. I just need it working." / "Sure, Tailwind is fine."
+- **conservative:** "Is Prisma the most widely used ORM for this? I want something with a large community." / "Let's stick with what most teams use."
+- **thorough-evaluator:** "Before we pick a state management library, can you compare Zustand vs Jotai vs Redux Toolkit? I want to understand bundle size, API surface, and TypeScript support."
+- **opinionated:** "We're using Drizzle, not Prisma. I've used both and Drizzle's SQL-like API is better for complex queries."
+
+**Context-dependent patterns:**
+
+Vendor philosophy may shift based on project importance or domain. Personal projects may use pragmatic-fast while professional projects use thorough-evaluator. Report the split if detected.
+
+---
+
+### 7. Frustration Triggers
+
+`dimension_id: frustration_triggers`
+
+**What we're measuring:** What causes visible frustration, correction, or negative emotional signals in the developer's messages to the agent.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `scope-creep` | Frustrated when the agent does things that were not asked for. Wants bounded execution. |
+| `instruction-adherence` | Frustrated when the agent doesn't follow instructions precisely. Values exactness. |
+| `verbosity` | Frustrated when the agent over-explains or is too wordy. Wants conciseness. |
+| `regression` | Frustrated when the agent breaks working code while fixing something else. Values stability. |
+
+**Signal patterns:**
+
+1. **Correction language** -- "I didn't ask for that", "don't do X", "I said Y not Z", "why did you change this?"
+2. **Repetition patterns** -- Repeating the same instruction with emphasis suggests instruction-adherence frustration
+3. **Emotional tone shifts** -- Shift from neutral to terse, use of capitals, exclamation marks, explicit frustration words
+4. **"Don't" statements** -- "don't add extra features", "don't explain so much", "don't touch that file" -- what they prohibit reveals what frustrates them
+5. **Frustration recovery** -- How quickly the developer returns to neutral tone after a frustration event
+
+**Detection heuristics:**
+
+1. If developer corrects the agent for doing unrequested work AND uses language like "I only asked for X", "stop adding things", "stick to what I asked" --> `scope-creep`
+2. If developer repeats instructions AND corrects specific deviations from stated requirements AND emphasizes precision ("I specifically said...") --> `instruction-adherence`
+3. If developer asks the agent to be shorter AND skips explanations AND expresses annoyance at length ("too much", "just the answer") --> `verbosity`
+4. If developer expresses frustration at broken functionality AND checks for regressions AND says "you broke X while fixing Y" --> `regression`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ frustration events showing consistent trigger pattern, same trigger across 2+ projects
+- **MEDIUM:** 5-9 frustration events, OR consistent within 1 project only
+- **LOW:** < 5 frustration events observed (note: low frustration count is POSITIVE -- it means the developer is generally satisfied, not that data is insufficient)
+- **UNSCORED:** 0 messages with frustration signals (note: "no frustration detected" is a valid finding)
+
+**Example quotes:**
+
+- **scope-creep:** "I asked you to fix the login bug, not refactor the entire auth module. Revert everything except the bug fix."
+- **instruction-adherence:** "I said to use a Map, not an object. I was specific about this. Please redo it with a Map."
+- **verbosity:** "Way too much explanation. Just show me the code change, nothing else."
+- **regression:** "The search was working fine before. Now after your 'fix' to the filter, search results are empty. Don't touch things I didn't ask you to change."
+
+**Context-dependent patterns:**
+
+Frustration triggers tend to be consistent across projects (personality-driven, not project-driven). However, their intensity may vary with project stakes. If multiple frustration triggers are observed, report the primary (most frequent) and note secondaries.
+
+---
+
+### 8. Learning Style
+
+`dimension_id: learning_style`
+
+**What we're measuring:** How the developer prefers to understand new concepts, tools, or patterns they encounter.
+
+**Rating spectrum:**
+
+| Rating | Description |
+|--------|-------------|
+| `self-directed` | Reads code directly, figures things out independently. Asks the agent specific questions. |
+| `guided` | Asks the agent to explain relevant parts. Prefers guided understanding. |
+| `documentation-first` | Reads official docs and tutorials before diving in. References documentation. |
+| `example-driven` | Wants working examples to modify and learn from. Pattern-matching learner. |
+
+**Signal patterns:**
+
+1. **Learning initiation** -- Does the developer start by reading code, asking for explanation, requesting docs, or asking for examples?
+2. **Reference to external sources** -- Mentions of documentation, tutorials, Stack Overflow, blog posts suggest documentation-first
+3. **Example requests** -- "show me an example", "can you give me a sample?", "let me see how this looks in practice"
+4. **Code-reading indicators** -- "I looked at the implementation", "I see that X calls Y", "from reading the code..."
+5. **Explanation requests vs. code requests** -- Ratio of "explain X" to "show me X" messages
+
+**Detection heuristics:**
+
+1. If developer references reading code directly AND asks specific targeted questions AND demonstrates independent investigation --> `self-directed`
+2. If developer asks the agent to explain concepts AND requests walkthroughs AND prefers Claude-mediated understanding --> `guided`
+3. If developer cites documentation AND asks for doc links AND mentions reading tutorials or official guides --> `documentation-first`
+4. If developer requests examples AND modifies provided examples AND learns by pattern matching --> `example-driven`
+
+**Confidence scoring:**
+
+- **HIGH:** 10+ learning interactions showing consistent preference, same preference across 2+ projects
+- **MEDIUM:** 5-9 learning interactions, OR consistent within 1 project only
+- **LOW:** < 5 learning interactions, OR preference varies by topic familiarity
+- **UNSCORED:** 0 messages with learning-relevant signals
+
+**Example quotes:**
+
+- **self-directed:** "I read through the middleware code. The issue is that the token check happens after the rate limiter. Should those be swapped?"
+- **guided:** "Can you walk me through how the auth flow works in this codebase? Start from the login request."
+- **documentation-first:** "I read the Prisma docs on relations. Can you help me apply the many-to-many pattern from their guide to our schema?"
+- **example-driven:** "Show me a working example of a protected API route with JWT validation. I'll adapt it for our endpoints."
+
+**Context-dependent patterns:**
+
+Learning style often varies with domain expertise. A developer may be self-directed in familiar domains but guided or example-driven in new ones. Report the split if detected: "context-dependent: self-directed for TypeScript/Node, example-driven for Rust/systems programming."
+
+---
+
+## Evidence Curation
+
+### Evidence Format
+
+Use the combined format for each evidence entry:
+
+**Signal:** [pattern interpretation -- what the quote demonstrates] / **Example:** "[trimmed quote, ~100 characters]" -- project: [project name]
+
+### Evidence Targets
+
+- **3 evidence quotes per dimension** (24 total across all 8 dimensions)
+- Select quotes that best illustrate the rated pattern
+- Prefer quotes from different projects to demonstrate cross-project consistency
+- When fewer than 3 relevant quotes exist, include what is available and note the evidence count
+
+### Quote Truncation
+
+- Trim quotes to the behavioral signal -- the part that demonstrates the pattern
+- Target approximately 100 characters per quote
+- Preserve the meaningful fragment, not the full message
+- If the signal is in the middle of a long message, use "..." to indicate trimming
+- Never include the full 500-character message when 50 characters capture the signal
+
+### Project Attribution
+
+- Every evidence quote must include the project name
+- Project attribution enables verification and shows cross-project patterns
+- Format: `-- project: [name]`
+
+### Sensitive Content Exclusion (Layer 1)
+
+The profiler agent must never select quotes containing any of the following patterns:
+
+- `sk-` (API key prefixes)
+- `Bearer ` (auth tokens)
+- `password` (credentials)
+- `secret` (secrets)
+- `token` (when used as a credential value, not a concept discussion)
+- `api_key` or `API_KEY` (API key references)
+- Full absolute file paths containing usernames (e.g., `/Users/john/...`, `/home/john/...`)
+
+**When sensitive content is found and excluded**, report as metadata in the analysis output:
+
+```json
+{
+  "sensitive_excluded": [
+    { "type": "api_key_pattern", "count": 2 },
+    { "type": "file_path_with_username", "count": 1 }
+  ]
+}
+```
+
+This metadata enables defense-in-depth auditing. Layer 2 (regex filter in the write-profile step) provides a second pass, but the profiler should still avoid selecting sensitive quotes.
+
+### Natural Language Priority
+
+Weight natural language messages higher than:
+- Pasted log output (detected by timestamps, repeated format strings, `[DEBUG]`, `[INFO]`, `[ERROR]`)
+- Session context dumps (messages starting with "This session is being continued from a previous conversation")
+- Large code pastes (messages where > 80% of content is inside code fences)
+
+These message types are genuine but carry less behavioral signal. Deprioritize them when selecting evidence quotes.
+
+---
+
+## Recency Weighting
+
+### Guideline
+
+Recent sessions (last 30 days) should be weighted approximately 3x compared to older sessions when analyzing patterns.
+
+### Rationale
+
+Developer styles evolve. A developer who was terse six months ago may now provide detailed structured context. Recent behavior is a more accurate reflection of current working style.
+
+### Application
+
+1. When counting signals for confidence scoring, recent signals count 3x (e.g., 4 recent signals = 12 weighted signals)
+2. When selecting evidence quotes, prefer recent quotes over older ones when both demonstrate the same pattern
+3. When patterns conflict between recent and older sessions, the recent pattern takes precedence for the rating, but note the evolution: "recently shifted from terse-direct to conversational"
+4. The 30-day window is relative to the analysis date, not a fixed date
+
+### Edge Cases
+
+- If ALL sessions are older than 30 days, apply no weighting (all sessions are equally stale)
+- If ALL sessions are within the last 30 days, apply no weighting (all sessions are equally recent)
+- The 3x weight is a guideline, not a hard multiplier -- use judgment when the weighted count changes a confidence threshold
+
+---
+
+## Thin Data Handling
+
+### Message Thresholds
+
+| Total Genuine Messages | Mode | Behavior |
+|------------------------|------|----------|
+| > 50 | `full` | Full analysis across all 8 dimensions. Questionnaire optional (user can choose to supplement). |
+| 20-50 | `hybrid` | Analyze available messages. Score each dimension with confidence. Supplement with questionnaire for LOW/UNSCORED dimensions. |
+| < 20 | `insufficient` | All dimensions scored LOW or UNSCORED. Recommend questionnaire fallback as primary profile source. Note: "insufficient session data for behavioral analysis." |
+
+### Handling Insufficient Dimensions
+
+When a specific dimension has insufficient data (even if total messages exceed thresholds):
+
+- Set confidence to `UNSCORED`
+- Set summary to: "Insufficient data -- no clear signals detected for this dimension."
+- Set claude_instruction to a neutral fallback: "No strong preference detected. Ask the developer when this dimension is relevant."
+- Set evidence_quotes to empty array `[]`
+- Set evidence_count to `0`
+
+### Questionnaire Supplement
+
+When operating in `hybrid` mode, the questionnaire fills gaps for dimensions where session analysis produced LOW or UNSCORED confidence. The questionnaire-derived ratings use:
+- **MEDIUM** confidence for strong, definitive picks
+- **LOW** confidence for "it varies" or ambiguous selections
+
+If session analysis and questionnaire agree on a dimension, confidence can be elevated (e.g., session LOW + questionnaire MEDIUM agreement = MEDIUM).
+
+---
+
+## Output Schema
+
+The profiler agent must return JSON matching this exact schema, wrapped in `<analysis>` tags.
+
+```json
+{
+  "profile_version": "1.0",
+  "analyzed_at": "ISO-8601 timestamp",
+  "data_source": "session_analysis",
+  "projects_analyzed": ["project-name-1", "project-name-2"],
+  "messages_analyzed": 0,
+  "message_threshold": "full|hybrid|insufficient",
+  "sensitive_excluded": [
+    { "type": "string", "count": 0 }
+  ],
+  "dimensions": {
+    "communication_style": {
+      "rating": "terse-direct|conversational|detailed-structured|mixed",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [
+        {
+          "signal": "Pattern interpretation describing what the quote demonstrates",
+          "quote": "Trimmed quote, approximately 100 characters",
+          "project": "project-name"
+        }
+      ],
+      "summary": "One to two sentence description of the observed pattern",
+      "claude_instruction": "Imperative directive for the agent: 'Match structured communication style' not 'You tend to provide structured context'"
+    },
+    "decision_speed": {
+      "rating": "fast-intuitive|deliberate-informed|research-first|delegator",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "explanation_depth": {
+      "rating": "code-only|concise|detailed|educational",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "debugging_approach": {
+      "rating": "fix-first|diagnostic|hypothesis-driven|collaborative",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "ux_philosophy": {
+      "rating": "function-first|pragmatic|design-conscious|backend-focused",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "vendor_philosophy": {
+      "rating": "pragmatic-fast|conservative|thorough-evaluator|opinionated",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "frustration_triggers": {
+      "rating": "scope-creep|instruction-adherence|verbosity|regression",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    },
+    "learning_style": {
+      "rating": "self-directed|guided|documentation-first|example-driven",
+      "confidence": "HIGH|MEDIUM|LOW|UNSCORED",
+      "evidence_count": 0,
+      "cross_project_consistent": true,
+      "evidence_quotes": [],
+      "summary": "string",
+      "claude_instruction": "string"
+    }
+  }
+}
+```
+
+### Schema Notes
+
+- **`profile_version`**: Always `"1.0"` for this schema version
+- **`analyzed_at`**: ISO-8601 timestamp of when the analysis was performed
+- **`data_source`**: `"session_analysis"` for session-based profiling, `"questionnaire"` for questionnaire-only, `"hybrid"` for combined
+- **`projects_analyzed`**: List of project names that contributed messages
+- **`messages_analyzed`**: Total number of genuine user messages processed
+- **`message_threshold`**: Which threshold mode was triggered (`full`, `hybrid`, `insufficient`)
+- **`sensitive_excluded`**: Array of excluded sensitive content types with counts (empty array if none found)
+- **`claude_instruction`**: Must be written in imperative form directed at the agent. This field is how the profile becomes actionable.
+  - Good: "Provide structured responses with headers and numbered lists to match this developer's communication style."
+  - Bad: "You tend to like structured responses."
+  - Good: "Ask before making changes beyond the stated request -- this developer values bounded execution."
+  - Bad: "The developer gets frustrated when you do extra work."
+
+---
+
+## Cross-Project Consistency
+
+### Assessment
+
+For each dimension, assess whether the observed pattern is consistent across the projects analyzed:
+
+- **`cross_project_consistent: true`** -- Same rating would apply regardless of which project is analyzed. Evidence from 2+ projects shows the same pattern.
+- **`cross_project_consistent: false`** -- Pattern varies by project. Include a context-dependent note in the summary.
+
+### Reporting Splits
+
+When `cross_project_consistent` is false, the summary must describe the split:
+
+- "Context-dependent: terse-direct for CLI/backend projects (gsd-tools, api-server), detailed-structured for frontend projects (dashboard, landing-page)."
+- "Context-dependent: fast-intuitive for familiar tech (React, Node), research-first for new domains (Rust, ML)."
+
+The rating field should reflect the **dominant** pattern (most evidence). The summary describes the nuance.
+
+### Phase 3 Resolution
+
+Context-dependent splits are resolved during Phase 3 orchestration. The orchestrator presents the split to the developer and asks which pattern represents their general preference. Until resolved, the agent uses the dominant pattern with awareness of the context-dependent variation.
+
+---
+
+*Reference document version: 1.0*
+*Dimensions: 8*
+*Schema: profile_version 1.0*
--- a/.pi/gsd/references/verification-patterns.md
+++ b/.pi/gsd/references/verification-patterns.md
@@ -0,0 +1,612 @@
+# Verification Patterns
+
+How to verify different types of artifacts are real implementations, not stubs or placeholders.
+
+<core_principle>
+**Existence ≠ Implementation**
+
+A file existing does not mean the feature works. Verification must check:
+1. **Exists** - File is present at expected path
+2. **Substantive** - Content is real implementation, not placeholder
+3. **Wired** - Connected to the rest of the system
+4. **Functional** - Actually works when invoked
+
+Levels 1-3 can be checked programmatically. Level 4 often requires human verification.
+</core_principle>
+
+<stub_detection>
+
+## Universal Stub Patterns
+
+These patterns indicate placeholder code regardless of file type:
+
+**Comment-based stubs:**
+```bash
+# Grep patterns for stub comments
+grep -E "(TODO|FIXME|XXX|HACK|PLACEHOLDER)" "$file"
+grep -E "implement|add later|coming soon|will be" "$file" -i
+grep -E "// \.\.\.|/\* \.\.\. \*/|# \.\.\." "$file"
+```
+
+**Placeholder text in output:**
+```bash
+# UI placeholder patterns
+grep -E "placeholder|lorem ipsum|coming soon|under construction" "$file" -i
+grep -E "sample|example|test data|dummy" "$file" -i
+grep -E "\[.*\]|<.*>|\{.*\}" "$file"  # Template brackets left in
+```
+
+**Empty or trivial implementations:**
+```bash
+# Functions that do nothing
+grep -E "return null|return undefined|return \{\}|return \[\]" "$file"
+grep -E "pass$|\.\.\.|\bnothing\b" "$file"
+grep -E "console\.(log|warn|error).*only" "$file"  # Log-only functions
+```
+
+**Hardcoded values where dynamic expected:**
+```bash
+# Hardcoded IDs, counts, or content
+grep -E "id.*=.*['\"].*['\"]" "$file"  # Hardcoded string IDs
+grep -E "count.*=.*\d+|length.*=.*\d+" "$file"  # Hardcoded counts
+grep -E "\\\$\d+\.\d{2}|\d+ items" "$file"  # Hardcoded display values
+```
+
+</stub_detection>
+
+<react_components>
+
+## React/Next.js Components
+
+**Existence check:**
+```bash
+# File exists and exports component
+[ -f "$component_path" ] && grep -E "export (default |)function|export const.*=.*\(" "$component_path"
+```
+
+**Substantive check:**
+```bash
+# Returns actual JSX, not placeholder
+grep -E "return.*<" "$component_path" | grep -v "return.*null" | grep -v "placeholder" -i
+
+# Has meaningful content (not just wrapper div)
+grep -E "<[A-Z][a-zA-Z]+|className=|onClick=|onChange=" "$component_path"
+
+# Uses props or state (not static)
+grep -E "props\.|useState|useEffect|useContext|\{.*\}" "$component_path"
+```
+
+**Stub patterns specific to React:**
+```javascript
+// RED FLAGS - These are stubs:
+return <div>Component</div>
+return <div>Placeholder</div>
+return <div>{/* TODO */}</div>
+return <p>Coming soon</p>
+return null
+return <></>
+
+// Also stubs - empty handlers:
+onClick={() => {}}
+onChange={() => console.log('clicked')}
+onSubmit={(e) => e.preventDefault()}  // Only prevents default, does nothing
+```
+
+**Wiring check:**
+```bash
+# Component imports what it needs
+grep -E "^import.*from" "$component_path"
+
+# Props are actually used (not just received)
+# Look for destructuring or props.X usage
+grep -E "\{ .* \}.*props|\bprops\.[a-zA-Z]+" "$component_path"
+
+# API calls exist (for data-fetching components)
+grep -E "fetch\(|axios\.|useSWR|useQuery|getServerSideProps|getStaticProps" "$component_path"
+```
+
+**Functional verification (human required):**
+- Does the component render visible content?
+- Do interactive elements respond to clicks?
+- Does data load and display?
+- Do error states show appropriately?
+
+</react_components>
+
+<api_routes>
+
+## API Routes (Next.js App Router / Express / etc.)
+
+**Existence check:**
+```bash
+# Route file exists
+[ -f "$route_path" ]
+
+# Exports HTTP method handlers (Next.js App Router)
+grep -E "export (async )?(function|const) (GET|POST|PUT|PATCH|DELETE)" "$route_path"
+
+# Or Express-style handlers
+grep -E "\.(get|post|put|patch|delete)\(" "$route_path"
+```
+
+**Substantive check:**
+```bash
+# Has actual logic, not just return statement
+wc -l "$route_path"  # More than 10-15 lines suggests real implementation
+
+# Interacts with data source
+grep -E "prisma\.|db\.|mongoose\.|sql|query|find|create|update|delete" "$route_path" -i
+
+# Has error handling
+grep -E "try|catch|throw|error|Error" "$route_path"
+
+# Returns meaningful response
+grep -E "Response\.json|res\.json|res\.send|return.*\{" "$route_path" | grep -v "message.*not implemented" -i
+```
+
+**Stub patterns specific to API routes:**
+```typescript
+// RED FLAGS - These are stubs:
+export async function POST() {
+  return Response.json({ message: "Not implemented" })
+}
+
+export async function GET() {
+  return Response.json([])  // Empty array with no DB query
+}
+
+export async function PUT() {
+  return new Response()  // Empty response
+}
+
+// Console log only:
+export async function POST(req) {
+  console.log(await req.json())
+  return Response.json({ ok: true })
+}
+```
+
+**Wiring check:**
+```bash
+# Imports database/service clients
+grep -E "^import.*prisma|^import.*db|^import.*client" "$route_path"
+
+# Actually uses request body (for POST/PUT)
+grep -E "req\.json\(\)|req\.body|request\.json\(\)" "$route_path"
+
+# Validates input (not just trusting request)
+grep -E "schema\.parse|validate|zod|yup|joi" "$route_path"
+```
+
+**Functional verification (human or automated):**
+- Does GET return real data from database?
+- Does POST actually create a record?
+- Does error response have correct status code?
+- Are auth checks actually enforced?
+
+</api_routes>
+
+<database_schema>
+
+## Database Schema (Prisma / Drizzle / SQL)
+
+**Existence check:**
+```bash
+# Schema file exists
+[ -f "prisma/schema.prisma" ] || [ -f "drizzle/schema.ts" ] || [ -f "src/db/schema.sql" ]
+
+# Model/table is defined
+grep -E "^model $model_name|CREATE TABLE $table_name|export const $table_name" "$schema_path"
+```
+
+**Substantive check:**
+```bash
+# Has expected fields (not just id)
+grep -A 20 "model $model_name" "$schema_path" | grep -E "^\s+\w+\s+\w+"
+
+# Has relationships if expected
+grep -E "@relation|REFERENCES|FOREIGN KEY" "$schema_path"
+
+# Has appropriate field types (not all String)
+grep -A 20 "model $model_name" "$schema_path" | grep -E "Int|DateTime|Boolean|Float|Decimal|Json"
+```
+
+**Stub patterns specific to schemas:**
+```prisma
+// RED FLAGS - These are stubs:
+model User {
+  id String @id
+  // TODO: add fields
+}
+
+model Message {
+  id        String @id
+  content   String  // Only one real field
+}
+
+// Missing critical fields:
+model Order {
+  id     String @id
+  // No: userId, items, total, status, createdAt
+}
+```
+
+**Wiring check:**
+```bash
+# Migrations exist and are applied
+ls prisma/migrations/ 2>/dev/null | wc -l  # Should be > 0
+npx prisma migrate status 2>/dev/null | grep -v "pending"
+
+# Client is generated
+[ -d "node_modules/.prisma/client" ]
+```
+
+**Functional verification:**
+```bash
+# Can query the table (automated)
+npx prisma db execute --stdin <<< "SELECT COUNT(*) FROM $table_name"
+```
+
+</database_schema>
+
+<hooks_utilities>
+
+## Custom Hooks and Utilities
+
+**Existence check:**
+```bash
+# File exists and exports function
+[ -f "$hook_path" ] && grep -E "export (default )?(function|const)" "$hook_path"
+```
+
+**Substantive check:**
+```bash
+# Hook uses React hooks (for custom hooks)
+grep -E "useState|useEffect|useCallback|useMemo|useRef|useContext" "$hook_path"
+
+# Has meaningful return value
+grep -E "return \{|return \[" "$hook_path"
+
+# More than trivial length
+[ $(wc -l < "$hook_path") -gt 10 ]
+```
+
+**Stub patterns specific to hooks:**
+```typescript
+// RED FLAGS - These are stubs:
+export function useAuth() {
+  return { user: null, login: () => {}, logout: () => {} }
+}
+
+export function useCart() {
+  const [items, setItems] = useState([])
+  return { items, addItem: () => console.log('add'), removeItem: () => {} }
+}
+
+// Hardcoded return:
+export function useUser() {
+  return { name: "Test User", email: "test@example.com" }
+}
+```
+
+**Wiring check:**
+```bash
+# Hook is actually imported somewhere
+grep -r "import.*$hook_name" src/ --include="*.tsx" --include="*.ts" | grep -v "$hook_path"
+
+# Hook is actually called
+grep -r "$hook_name()" src/ --include="*.tsx" --include="*.ts" | grep -v "$hook_path"
+```
+
+</hooks_utilities>
+
+<environment_config>
+
+## Environment Variables and Configuration
+
+**Existence check:**
+```bash
+# .env file exists
+[ -f ".env" ] || [ -f ".env.local" ]
+
+# Required variable is defined
+grep -E "^$VAR_NAME=" .env .env.local 2>/dev/null
+```
+
+**Substantive check:**
+```bash
+# Variable has actual value (not placeholder)
+grep -E "^$VAR_NAME=.+" .env .env.local 2>/dev/null | grep -v "your-.*-here|xxx|placeholder|TODO" -i
+
+# Value looks valid for type:
+# - URLs should start with http
+# - Keys should be long enough
+# - Booleans should be true/false
+```
+
+**Stub patterns specific to env:**
+```bash
+# RED FLAGS - These are stubs:
+DATABASE_URL=your-database-url-here
+STRIPE_SECRET_KEY=sk_test_xxx
+API_KEY=placeholder
+NEXT_PUBLIC_API_URL=http://localhost:3000  # Still pointing to localhost in prod
+```
+
+**Wiring check:**
+```bash
+# Variable is actually used in code
+grep -r "process\.env\.$VAR_NAME|env\.$VAR_NAME" src/ --include="*.ts" --include="*.tsx"
+
+# Variable is in validation schema (if using zod/etc for env)
+grep -E "$VAR_NAME" src/env.ts src/env.mjs 2>/dev/null
+```
+
+</environment_config>
+
+<wiring_verification>
+
+## Wiring Verification Patterns
+
+Wiring verification checks that components actually communicate. This is where most stubs hide.
+
+### Pattern: Component → API
+
+**Check:** Does the component actually call the API?
+
+```bash
+# Find the fetch/axios call
+grep -E "fetch\(['\"].*$api_path|axios\.(get|post).*$api_path" "$component_path"
+
+# Verify it's not commented out
+grep -E "fetch\(|axios\." "$component_path" | grep -v "^.*//.*fetch"
+
+# Check the response is used
+grep -E "await.*fetch|\.then\(|setData|setState" "$component_path"
+```
+
+**Red flags:**
+```typescript
+// Fetch exists but response ignored:
+fetch('/api/messages')  // No await, no .then, no assignment
+
+// Fetch in comment:
+// fetch('/api/messages').then(r => r.json()).then(setMessages)
+
+// Fetch to wrong endpoint:
+fetch('/api/message')  // Typo - should be /api/messages
+```
+
+### Pattern: API → Database
+
+**Check:** Does the API route actually query the database?
+
+```bash
+# Find the database call
+grep -E "prisma\.$model|db\.query|Model\.find" "$route_path"
+
+# Verify it's awaited
+grep -E "await.*prisma|await.*db\." "$route_path"
+
+# Check result is returned
+grep -E "return.*json.*data|res\.json.*result" "$route_path"
+```
+
+**Red flags:**
+```typescript
+// Query exists but result not returned:
+await prisma.message.findMany()
+return Response.json({ ok: true })  // Returns static, not query result
+
+// Query not awaited:
+const messages = prisma.message.findMany()  // Missing await
+return Response.json(messages)  // Returns Promise, not data
+```
+
+### Pattern: Form → Handler
+
+**Check:** Does the form submission actually do something?
+
+```bash
+# Find onSubmit handler
+grep -E "onSubmit=\{|handleSubmit" "$component_path"
+
+# Check handler has content
+grep -A 10 "onSubmit.*=" "$component_path" | grep -E "fetch|axios|mutate|dispatch"
+
+# Verify not just preventDefault
+grep -A 5 "onSubmit" "$component_path" | grep -v "only.*preventDefault" -i
+```
+
+**Red flags:**
+```typescript
+// Handler only prevents default:
+onSubmit={(e) => e.preventDefault()}
+
+// Handler only logs:
+const handleSubmit = (data) => {
+  console.log(data)
+}
+
+// Handler is empty:
+onSubmit={() => {}}
+```
+
+### Pattern: State → Render
+
+**Check:** Does the component render state, not hardcoded content?
+
+```bash
+# Find state usage in JSX
+grep -E "\{.*messages.*\}|\{.*data.*\}|\{.*items.*\}" "$component_path"
+
+# Check map/render of state
+grep -E "\.map\(|\.filter\(|\.reduce\(" "$component_path"
+
+# Verify dynamic content
+grep -E "\{[a-zA-Z_]+\." "$component_path"  # Variable interpolation
+```
+
+**Red flags:**
+```tsx
+// Hardcoded instead of state:
+return <div>
+  <p>Message 1</p>
+  <p>Message 2</p>
+</div>
+
+// State exists but not rendered:
+const [messages, setMessages] = useState([])
+return <div>No messages</div>  // Always shows "no messages"
+
+// Wrong state rendered:
+const [messages, setMessages] = useState([])
+return <div>{otherData.map(...)}</div>  // Uses different data
+```
+
+</wiring_verification>
+
+<verification_checklist>
+
+## Quick Verification Checklist
+
+For each artifact type, run through this checklist:
+
+### Component Checklist
+- [ ] File exists at expected path
+- [ ] Exports a function/const component
+- [ ] Returns JSX (not null/empty)
+- [ ] No placeholder text in render
+- [ ] Uses props or state (not static)
+- [ ] Event handlers have real implementations
+- [ ] Imports resolve correctly
+- [ ] Used somewhere in the app
+
+### API Route Checklist
+- [ ] File exists at expected path
+- [ ] Exports HTTP method handlers
+- [ ] Handlers have more than 5 lines
+- [ ] Queries database or service
+- [ ] Returns meaningful response (not empty/placeholder)
+- [ ] Has error handling
+- [ ] Validates input
+- [ ] Called from frontend
+
+### Schema Checklist
+- [ ] Model/table defined
+- [ ] Has all expected fields
+- [ ] Fields have appropriate types
+- [ ] Relationships defined if needed
+- [ ] Migrations exist and applied
+- [ ] Client generated
+
+### Hook/Utility Checklist
+- [ ] File exists at expected path
+- [ ] Exports function
+- [ ] Has meaningful implementation (not empty returns)
+- [ ] Used somewhere in the app
+- [ ] Return values consumed
+
+### Wiring Checklist
+- [ ] Component → API: fetch/axios call exists and uses response
+- [ ] API → Database: query exists and result returned
+- [ ] Form → Handler: onSubmit calls API/mutation
+- [ ] State → Render: state variables appear in JSX
+
+</verification_checklist>
+
+<automated_verification_script>
+
+## Automated Verification Approach
+
+For the verification subagent, use this pattern:
+
+```bash
+# 1. Check existence
+check_exists() {
+  [ -f "$1" ] && echo "EXISTS: $1" || echo "MISSING: $1"
+}
+
+# 2. Check for stub patterns
+check_stubs() {
+  local file="$1"
+  local stubs=$(grep -c -E "TODO|FIXME|placeholder|not implemented" "$file" 2>/dev/null || echo 0)
+  [ "$stubs" -gt 0 ] && echo "STUB_PATTERNS: $stubs in $file"
+}
+
+# 3. Check wiring (component calls API)
+check_wiring() {
+  local component="$1"
+  local api_path="$2"
+  grep -q "$api_path" "$component" && echo "WIRED: $component → $api_path" || echo "NOT_WIRED: $component → $api_path"
+}
+
+# 4. Check substantive (more than N lines, has expected patterns)
+check_substantive() {
+  local file="$1"
+  local min_lines="$2"
+  local pattern="$3"
+  local lines=$(wc -l < "$file" 2>/dev/null || echo 0)
+  local has_pattern=$(grep -c -E "$pattern" "$file" 2>/dev/null || echo 0)
+  [ "$lines" -ge "$min_lines" ] && [ "$has_pattern" -gt 0 ] && echo "SUBSTANTIVE: $file" || echo "THIN: $file ($lines lines, $has_pattern matches)"
+}
+```
+
+Run these checks against each must-have artifact. Aggregate results into VERIFICATION.md.
+
+</automated_verification_script>
+
+<human_verification_triggers>
+
+## When to Require Human Verification
+
+Some things can't be verified programmatically. Flag these for human testing:
+
+**Always human:**
+- Visual appearance (does it look right?)
+- User flow completion (can you actually do the thing?)
+- Real-time behavior (WebSocket, SSE)
+- External service integration (Stripe, email sending)
+- Error message clarity (is the message helpful?)
+- Performance feel (does it feel fast?)
+
+**Human if uncertain:**
+- Complex wiring that grep can't trace
+- Dynamic behavior depending on state
+- Edge cases and error states
+- Mobile responsiveness
+- Accessibility
+
+**Format for human verification request:**
+```markdown
+## Human Verification Required
+
+### 1. Chat message sending
+**Test:** Type a message and click Send
+**Expected:** Message appears in list, input clears
+**Check:** Does message persist after refresh?
+
+### 2. Error handling
+**Test:** Disconnect network, try to send
+**Expected:** Error message appears, message not lost
+**Check:** Can retry after reconnect?
+```
+
+</human_verification_triggers>
+
+<checkpoint_automation_reference>
+
+## Pre-Checkpoint Automation
+
+For automation-first checkpoint patterns, server lifecycle management, CLI installation handling, and error recovery protocols, see:
+
+**@.pi/gsd/references/checkpoints.md** → `<automation_reference>` section
+
+Key principles:
+- the agent sets up verification environment BEFORE presenting checkpoints
+- Users never run CLI commands (visit URLs only)
+- Server lifecycle: start before checkpoint, handle port conflicts, keep running for duration
+- CLI installation: auto-install where safe, checkpoint for user choice otherwise
+- Error handling: fix broken environment before checkpoint, never present checkpoint with failed setup
+
+</checkpoint_automation_reference>
--- a/.pi/gsd/references/workstream-flag.md
+++ b/.pi/gsd/references/workstream-flag.md
@@ -0,0 +1,58 @@
+# Workstream Flag (`--ws`)
+
+## Overview
+
+The `--ws <name>` flag scopes GSD operations to a specific workstream, enabling
+parallel milestone work by multiple Claude Code instances on the same codebase.
+
+## Resolution Priority
+
+1. `--ws <name>` flag (explicit, highest priority)
+2. `GSD_WORKSTREAM` environment variable (per-instance)
+3. `.planning/active-workstream` file (shared, last-writer-wins)
+4. `null` - flat mode (no workstreams)
+
+## Routing Propagation
+
+All workflow routing commands include `${GSD_WS}` which:
+- Expands to `--ws <name>` when a workstream is active
+- Expands to empty string in flat mode (backward compatible)
+
+This ensures workstream scope chains automatically through the workflow:
+`new-milestone → discuss-phase → plan-phase → execute-phase → transition`
+
+## Directory Structure
+
+```
+.planning/
+├── PROJECT.md          # Shared
+├── config.json         # Shared
+├── milestones/         # Shared
+├── codebase/           # Shared
+├── active-workstream   # Points to current ws
+└── workstreams/
+    ├── feature-a/      # Workstream A
+    │   ├── STATE.md
+    │   ├── ROADMAP.md
+    │   ├── REQUIREMENTS.md
+    │   └── phases/
+    └── feature-b/      # Workstream B
+        ├── STATE.md
+        ├── ROADMAP.md
+        ├── REQUIREMENTS.md
+        └── phases/
+```
+
+## CLI Usage
+
+```bash
+# All gsd-tools commands accept --ws
+node gsd-tools.cjs state json --ws feature-a
+node gsd-tools.cjs find-phase 3 --ws feature-b
+
+# Workstream CRUD
+node gsd-tools.cjs workstream create <name>
+node gsd-tools.cjs workstream list
+node gsd-tools.cjs workstream status <name>
+node gsd-tools.cjs workstream complete <name>
+```