feat: basecamp-project skill

This commit is contained in:
m3tm3re
2026-04-24 20:00:33 +02:00
parent 0ad41acb03
commit 6e0e847299
211 changed files with 46029 additions and 2592 deletions

View File

@@ -0,0 +1,778 @@
<overview>
Plans execute autonomously. Checkpoints formalize interaction points where human verification or decisions are needed.
**Core principle:** the agent automates everything with CLI/API. Checkpoints are for verification and decisions, not manual work.
**Golden rules:**
1. **If the agent can run it, the agent runs it** - Never ask user to execute CLI commands, start servers, or run builds
2. **the agent sets up the verification environment** - Start dev servers, seed databases, configure env vars
3. **User only does what requires human judgment** - Visual checks, UX evaluation, "does this feel right?"
4. **Secrets come from user, automation comes from the agent** - Ask for API keys, then the agent uses them via CLI
5. **Auto-mode bypasses verification/decision checkpoints** - When `workflow._auto_chain_active` or `workflow.auto_advance` is true in config: human-verify auto-approves, decision auto-selects first option, human-action still stops (auth gates cannot be automated)
</overview>
<checkpoint_types>
<type name="human-verify">
## checkpoint:human-verify (Most Common - 90%)
**When:** the agent completed automated work, human confirms it works correctly.
**Use for:**
- Visual UI checks (layout, styling, responsiveness)
- Interactive flows (click through wizard, test user flows)
- Functional verification (feature works as expected)
- Audio/video playback quality
- Animation smoothness
- Accessibility testing
**Structure:**
```xml
<task type="checkpoint:human-verify" gate="blocking">
<what-built>[What the agent automated and deployed/built]</what-built>
<how-to-verify>
[Exact steps to test - URLs, commands, expected behavior]
</how-to-verify>
<resume-signal>[How to continue - "approved", "yes", or describe issues]</resume-signal>
</task>
```
**Example: UI Component (shows key pattern: the agent starts server BEFORE checkpoint)**
```xml
<task type="auto">
<name>Build responsive dashboard layout</name>
<files>src/components/Dashboard.tsx, src/app/dashboard/page.tsx</files>
<action>Create dashboard with sidebar, header, and content area. Use Tailwind responsive classes for mobile.</action>
<verify>npm run build succeeds, no TypeScript errors</verify>
<done>Dashboard component builds without errors</done>
</task>
<task type="auto">
<name>Start dev server for verification</name>
<action>Run `npm run dev` in background, wait for "ready" message, capture port</action>
<verify>fetch http://localhost:3000 returns 200</verify>
<done>Dev server running at http://localhost:3000</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Responsive dashboard layout - dev server running at http://localhost:3000</what-built>
<how-to-verify>
Visit http://localhost:3000/dashboard and verify:
1. Desktop (>1024px): Sidebar left, content right, header top
2. Tablet (768px): Sidebar collapses to hamburger menu
3. Mobile (375px): Single column layout, bottom nav appears
4. No layout shift or horizontal scroll at any size
</how-to-verify>
<resume-signal>Type "approved" or describe layout issues</resume-signal>
</task>
```
**Example: Xcode Build**
```xml
<task type="auto">
<name>Build macOS app with Xcode</name>
<files>App.xcodeproj, Sources/</files>
<action>Run `xcodebuild -project App.xcodeproj -scheme App build`. Check for compilation errors in output.</action>
<verify>Build output contains "BUILD SUCCEEDED", no errors</verify>
<done>App builds successfully</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Built macOS app at DerivedData/Build/Products/Debug/App.app</what-built>
<how-to-verify>
Open App.app and test:
- App launches without crashes
- Menu bar icon appears
- Preferences window opens correctly
- No visual glitches or layout issues
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
</type>
<type name="decision">
## checkpoint:decision (9%)
**When:** Human must make choice that affects implementation direction.
**Use for:**
- Technology selection (which auth provider, which database)
- Architecture decisions (monorepo vs separate repos)
- Design choices (color scheme, layout approach)
- Feature prioritization (which variant to build)
- Data model decisions (schema structure)
**Structure:**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>[What's being decided]</decision>
<context>[Why this decision matters]</context>
<options>
<option id="option-a">
<name>[Option name]</name>
<pros>[Benefits]</pros>
<cons>[Tradeoffs]</cons>
</option>
<option id="option-b">
<name>[Option name]</name>
<pros>[Benefits]</pros>
<cons>[Tradeoffs]</cons>
</option>
</options>
<resume-signal>[How to indicate choice]</resume-signal>
</task>
```
**Example: Auth Provider Selection**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>Select authentication provider</decision>
<context>
Need user authentication for the app. Three solid options with different tradeoffs.
</context>
<options>
<option id="supabase">
<name>Supabase Auth</name>
<pros>Built-in with Supabase DB we're using, generous free tier, row-level security integration</pros>
<cons>Less customizable UI, tied to Supabase ecosystem</cons>
</option>
<option id="clerk">
<name>Clerk</name>
<pros>Beautiful pre-built UI, best developer experience, excellent docs</pros>
<cons>Paid after 10k MAU, vendor lock-in</cons>
</option>
<option id="nextauth">
<name>NextAuth.js</name>
<pros>Free, self-hosted, maximum control, widely adopted</pros>
<cons>More setup work, you manage security updates, UI is DIY</cons>
</option>
</options>
<resume-signal>Select: supabase, clerk, or nextauth</resume-signal>
</task>
```
**Example: Database Selection**
```xml
<task type="checkpoint:decision" gate="blocking">
<decision>Select database for user data</decision>
<context>
App needs persistent storage for users, sessions, and user-generated content.
Expected scale: 10k users, 1M records first year.
</context>
<options>
<option id="supabase">
<name>Supabase (Postgres)</name>
<pros>Full SQL, generous free tier, built-in auth, real-time subscriptions</pros>
<cons>Vendor lock-in for real-time features, less flexible than raw Postgres</cons>
</option>
<option id="planetscale">
<name>PlanetScale (MySQL)</name>
<pros>Serverless scaling, branching workflow, excellent DX</pros>
<cons>MySQL not Postgres, no foreign keys in free tier</cons>
</option>
<option id="convex">
<name>Convex</name>
<pros>Real-time by default, TypeScript-native, automatic caching</pros>
<cons>Newer platform, different mental model, less SQL flexibility</cons>
</option>
</options>
<resume-signal>Select: supabase, planetscale, or convex</resume-signal>
</task>
```
</type>
<type name="human-action">
## checkpoint:human-action (1% - Rare)
**When:** Action has NO CLI/API and requires human-only interaction, OR the agent hit an authentication gate during automation.
**Use ONLY for:**
- **Authentication gates** - the agent tried CLI/API but needs credentials (this is NOT a failure)
- Email verification links (clicking email)
- SMS 2FA codes (phone verification)
- Manual account approvals (platform requires human review)
- Credit card 3D Secure flows (web-based payment authorization)
- OAuth app approvals (web-based approval)
**Do NOT use for pre-planned manual work:**
- Deploying (use CLI - auth gate if needed)
- Creating webhooks/databases (use API/CLI - auth gate if needed)
- Running builds/tests (use Bash tool)
- Creating files (use Write tool)
**Structure:**
```xml
<task type="checkpoint:human-action" gate="blocking">
<action>[What human must do - the agent already did everything automatable]</action>
<instructions>
[What the agent already automated]
[The ONE thing requiring human action]
</instructions>
<verification>[What the agent can check afterward]</verification>
<resume-signal>[How to continue]</resume-signal>
</task>
```
**Example: Email Verification**
```xml
<task type="auto">
<name>Create SendGrid account via API</name>
<action>Use SendGrid API to create subuser account with provided email. Request verification email.</action>
<verify>API returns 201, account created</verify>
<done>Account created, verification email sent</done>
</task>
<task type="checkpoint:human-action" gate="blocking">
<action>Complete email verification for SendGrid account</action>
<instructions>
I created the account and requested verification email.
Check your inbox for SendGrid verification link and click it.
</instructions>
<verification>SendGrid API key works: curl test succeeds</verification>
<resume-signal>Type "done" when email verified</resume-signal>
</task>
```
**Example: Authentication Gate (Dynamic Checkpoint)**
```xml
<task type="auto">
<name>Deploy to Vercel</name>
<files>.vercel/, vercel.json</files>
<action>Run `vercel --yes` to deploy</action>
<verify>vercel ls shows deployment, fetch returns 200</verify>
</task>
<!-- If vercel returns "Error: Not authenticated", the agent creates checkpoint on the fly -->
<task type="checkpoint:human-action" gate="blocking">
<action>Authenticate Vercel CLI so I can continue deployment</action>
<instructions>
I tried to deploy but got authentication error.
Run: vercel login
This will open your browser - complete the authentication flow.
</instructions>
<verification>vercel whoami returns your account email</verification>
<resume-signal>Type "done" when authenticated</resume-signal>
</task>
<!-- After authentication, the agent retries the deployment -->
<task type="auto">
<name>Retry Vercel deployment</name>
<action>Run `vercel --yes` (now authenticated)</action>
<verify>vercel ls shows deployment, fetch returns 200</verify>
</task>
```
**Key distinction:** Auth gates are created dynamically when the agent encounters auth errors. NOT pre-planned - the agent automates first, asks for credentials only when blocked.
</type>
</checkpoint_types>
<execution_protocol>
When the agent encounters `type="checkpoint:*"`:
1. **Stop immediately** - do not proceed to next task
2. **Display checkpoint clearly** using the format below
3. **Wait for user response** - do not hallucinate completion
4. **Verify if possible** - check files, run tests, whatever is specified
5. **Resume execution** - continue to next task only after confirmation
**For checkpoint:human-verify:**
```
╔═══════════════════════════════════════════════════════╗
║ CHECKPOINT: Verification Required ║
╚═══════════════════════════════════════════════════════╝
Progress: 5/8 tasks complete
Task: Responsive dashboard layout
Built: Responsive dashboard at /dashboard
How to verify:
1. Visit: http://localhost:3000/dashboard
2. Desktop (>1024px): Sidebar visible, content fills remaining space
3. Tablet (768px): Sidebar collapses to icons
4. Mobile (375px): Sidebar hidden, hamburger menu appears
────────────────────────────────────────────────────────
→ YOUR ACTION: Type "approved" or describe issues
────────────────────────────────────────────────────────
```
**For checkpoint:decision:**
```
╔═══════════════════════════════════════════════════════╗
║ CHECKPOINT: Decision Required ║
╚═══════════════════════════════════════════════════════╝
Progress: 2/6 tasks complete
Task: Select authentication provider
Decision: Which auth provider should we use?
Context: Need user authentication. Three options with different tradeoffs.
Options:
1. supabase - Built-in with our DB, free tier
Pros: Row-level security integration, generous free tier
Cons: Less customizable UI, ecosystem lock-in
2. clerk - Best DX, paid after 10k users
Pros: Beautiful pre-built UI, excellent documentation
Cons: Vendor lock-in, pricing at scale
3. nextauth - Self-hosted, maximum control
Pros: Free, no vendor lock-in, widely adopted
Cons: More setup work, DIY security updates
────────────────────────────────────────────────────────
→ YOUR ACTION: Select supabase, clerk, or nextauth
────────────────────────────────────────────────────────
```
**For checkpoint:human-action:**
```
╔═══════════════════════════════════════════════════════╗
║ CHECKPOINT: Action Required ║
╚═══════════════════════════════════════════════════════╝
Progress: 3/8 tasks complete
Task: Deploy to Vercel
Attempted: vercel --yes
Error: Not authenticated. Please run 'vercel login'
What you need to do:
1. Run: vercel login
2. Complete browser authentication when it opens
3. Return here when done
I'll verify: vercel whoami returns your account
────────────────────────────────────────────────────────
→ YOUR ACTION: Type "done" when authenticated
────────────────────────────────────────────────────────
```
</execution_protocol>
<authentication_gates>
**Auth gate = the agent tried CLI/API, got auth error.** Not a failure - a gate requiring human input to unblock.
**Pattern:** the agent tries automation → auth error → creates checkpoint:human-action → user authenticates → the agent retries → continues
**Gate protocol:**
1. Recognize it's not a failure - missing auth is expected
2. Stop current task - don't retry repeatedly
3. Create checkpoint:human-action dynamically
4. Provide exact authentication steps
5. Verify authentication works
6. Retry the original task
7. Continue normally
**Key distinction:**
- Pre-planned checkpoint: "I need you to do X" (wrong - the agent should automate)
- Auth gate: "I tried to automate X but need credentials" (correct - unblocks automation)
</authentication_gates>
<automation_reference>
**The rule:** If it has CLI/API, the agent does it. Never ask human to perform automatable work.
## Service CLI Reference
| Service | CLI/API | Key Commands | Auth Gate |
| ----------- | -------------- | ----------------------------------------- | -------------------- |
| Vercel | `vercel` | `--yes`, `env add`, `--prod`, `ls` | `vercel login` |
| Railway | `railway` | `init`, `up`, `variables set` | `railway login` |
| Fly | `fly` | `launch`, `deploy`, `secrets set` | `fly auth login` |
| Stripe | `stripe` + API | `listen`, `trigger`, API calls | API key in .env |
| Supabase | `supabase` | `init`, `link`, `db push`, `gen types` | `supabase login` |
| Upstash | `upstash` | `redis create`, `redis get` | `upstash auth login` |
| PlanetScale | `pscale` | `database create`, `branch create` | `pscale auth login` |
| GitHub | `gh` | `repo create`, `pr create`, `secret set` | `gh auth login` |
| Node | `npm`/`pnpm` | `install`, `run build`, `test`, `run dev` | N/A |
| Xcode | `xcodebuild` | `-project`, `-scheme`, `build`, `test` | N/A |
| Convex | `npx convex` | `dev`, `deploy`, `env set`, `env get` | `npx convex login` |
## Environment Variable Automation
**Env files:** Use Write/Edit tools. Never ask human to create .env manually.
**Dashboard env vars via CLI:**
| Platform | CLI Command | Example |
| -------- | ----------------------- | ------------------------------------------ |
| Convex | `npx convex env set` | `npx convex env set OPENAI_API_KEY sk-...` |
| Vercel | `vercel env add` | `vercel env add STRIPE_KEY production` |
| Railway | `railway variables set` | `railway variables set API_KEY=value` |
| Fly | `fly secrets set` | `fly secrets set DATABASE_URL=...` |
| Supabase | `supabase secrets set` | `supabase secrets set MY_SECRET=value` |
**Secret collection pattern:**
```xml
<!-- WRONG: Asking user to add env vars in dashboard -->
<task type="checkpoint:human-action">
<action>Add OPENAI_API_KEY to Convex dashboard</action>
<instructions>Go to dashboard.convex.dev → Settings → Environment Variables → Add</instructions>
</task>
<!-- RIGHT: the agent asks for value, then adds via CLI -->
<task type="checkpoint:human-action">
<action>Provide your OpenAI API key</action>
<instructions>
I need your OpenAI API key for Convex backend.
Get it from: https://platform.openai.com/api-keys
Paste the key (starts with sk-)
</instructions>
<verification>I'll add it via `npx convex env set` and verify</verification>
<resume-signal>Paste your API key</resume-signal>
</task>
<task type="auto">
<name>Configure OpenAI key in Convex</name>
<action>Run `npx convex env set OPENAI_API_KEY {user-provided-key}`</action>
<verify>`npx convex env get OPENAI_API_KEY` returns the key (masked)</verify>
</task>
```
## Dev Server Automation
| Framework | Start Command | Ready Signal | Default URL |
| --------- | ---------------------------- | ------------------------------ | --------------------- |
| Next.js | `npm run dev` | "Ready in" or "started server" | http://localhost:3000 |
| Vite | `npm run dev` | "ready in" | http://localhost:5173 |
| Convex | `npx convex dev` | "Convex functions ready" | N/A (backend only) |
| Express | `npm start` | "listening on port" | http://localhost:3000 |
| Django | `python manage.py runserver` | "Starting development server" | http://localhost:8000 |
**Server lifecycle:**
```bash
# Run in background, capture PID
npm run dev &
DEV_SERVER_PID=$!
# Wait for ready (max 30s) - uses fetch() for cross-platform compatibility
timeout 30 bash -c 'until node -e "fetch(\"http://localhost:3000\").then(r=>{process.exit(r.ok?0:1)}).catch(()=>process.exit(1))" 2>/dev/null; do sleep 1; done'
```
**Port conflicts:** Kill stale process (`lsof -ti:3000 | xargs kill`) or use alternate port (`--port 3001`).
**Server stays running** through checkpoints. Only kill when plan complete, switching to production, or port needed for different service.
## CLI Installation Handling
| CLI | Auto-install? | Command |
| ------------- | ------------- | ----------------------------------------------------- |
| npm/pnpm/yarn | No - ask user | User chooses package manager |
| vercel | Yes | `npm i -g vercel` |
| gh (GitHub) | Yes | `brew install gh` (macOS) or `apt install gh` (Linux) |
| stripe | Yes | `npm i -g stripe` |
| supabase | Yes | `npm i -g supabase` |
| convex | No - use npx | `npx convex` (no install needed) |
| fly | Yes | `brew install flyctl` or curl installer |
| railway | Yes | `npm i -g @railway/cli` |
**Protocol:** Try command → "command not found" → auto-installable? → yes: install silently, retry → no: checkpoint asking user to install.
## Pre-Checkpoint Automation Failures
| Failure | Response |
| ------------------ | ----------------------------------------------------------- |
| Server won't start | Check error, fix issue, retry (don't proceed to checkpoint) |
| Port in use | Kill stale process or use alternate port |
| Missing dependency | Run `npm install`, retry |
| Build error | Fix the error first (bug, not checkpoint issue) |
| Auth error | Create auth gate checkpoint |
| Network timeout | Retry with backoff, then checkpoint if persistent |
**Never present a checkpoint with broken verification environment.** If the local server isn't responding, don't ask user to "visit localhost:3000".
> **Cross-platform note:** Use `node -e "fetch('http://localhost:3000').then(r=>console.log(r.status))"` instead of `curl` for health checks. `curl` is broken on Windows MSYS/Git Bash due to SSL/path mangling issues.
```xml
<!-- WRONG: Checkpoint with broken environment -->
<task type="checkpoint:human-verify">
<what-built>Dashboard (server failed to start)</what-built>
<how-to-verify>Visit http://localhost:3000...</how-to-verify>
</task>
<!-- RIGHT: Fix first, then checkpoint -->
<task type="auto">
<name>Fix server startup issue</name>
<action>Investigate error, fix root cause, restart server</action>
<verify>fetch http://localhost:3000 returns 200</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Dashboard - server running at http://localhost:3000</what-built>
<how-to-verify>Visit http://localhost:3000/dashboard...</how-to-verify>
</task>
```
## Automatable Quick Reference
| Action | Automatable? | the agent does it? |
| -------------------------------- | -------------------------- | ------------------ |
| Deploy to Vercel | Yes (`vercel`) | YES |
| Create Stripe webhook | Yes (API) | YES |
| Write .env file | Yes (Write tool) | YES |
| Create Upstash DB | Yes (`upstash`) | YES |
| Run tests | Yes (`npm test`) | YES |
| Start dev server | Yes (`npm run dev`) | YES |
| Add env vars to Convex | Yes (`npx convex env set`) | YES |
| Add env vars to Vercel | Yes (`vercel env add`) | YES |
| Seed database | Yes (CLI/API) | YES |
| Click email verification link | No | NO |
| Enter credit card with 3DS | No | NO |
| Complete OAuth in browser | No | NO |
| Visually verify UI looks correct | No | NO |
| Test interactive user flows | No | NO |
</automation_reference>
<writing_guidelines>
**DO:**
- Automate everything with CLI/API before checkpoint
- Be specific: "Visit https://myapp.vercel.app" not "check deployment"
- Number verification steps
- State expected outcomes: "You should see X"
- Provide context: why this checkpoint exists
**DON'T:**
- Ask human to do work the agent can automate ❌
- Assume knowledge: "Configure the usual settings" ❌
- Skip steps: "Set up database" (too vague) ❌
- Mix multiple verifications in one checkpoint ❌
**Placement:**
- **After automation completes** - not before the agent does the work
- **After UI buildout** - before declaring phase complete
- **Before dependent work** - decisions before implementation
- **At integration points** - after configuring external services
**Bad placement:** Before automation ❌ | Too frequent ❌ | Too late (dependent tasks already needed the result) ❌
</writing_guidelines>
<examples>
### Example 1: Database Setup (No Checkpoint Needed)
```xml
<task type="auto">
<name>Create Upstash Redis database</name>
<files>.env</files>
<action>
1. Run `upstash redis create myapp-cache --region us-east-1`
2. Capture connection URL from output
3. Write to .env: UPSTASH_REDIS_URL={url}
4. Verify connection with test command
</action>
<verify>
- upstash redis list shows database
- .env contains UPSTASH_REDIS_URL
- Test connection succeeds
</verify>
<done>Redis database created and configured</done>
</task>
<!-- NO CHECKPOINT NEEDED - the agent automated everything and verified programmatically -->
```
### Example 2: Full Auth Flow (Single checkpoint at end)
```xml
<task type="auto">
<name>Create user schema</name>
<files>src/db/schema.ts</files>
<action>Define User, Session, Account tables with Drizzle ORM</action>
<verify>npm run db:generate succeeds</verify>
</task>
<task type="auto">
<name>Create auth API routes</name>
<files>src/app/api/auth/[...nextauth]/route.ts</files>
<action>Set up NextAuth with GitHub provider, JWT strategy</action>
<verify>TypeScript compiles, no errors</verify>
</task>
<task type="auto">
<name>Create login UI</name>
<files>src/app/login/page.tsx, src/components/LoginButton.tsx</files>
<action>Create login page with GitHub OAuth button</action>
<verify>npm run build succeeds</verify>
</task>
<task type="auto">
<name>Start dev server for auth testing</name>
<action>Run `npm run dev` in background, wait for ready signal</action>
<verify>fetch http://localhost:3000 returns 200</verify>
<done>Dev server running at http://localhost:3000</done>
</task>
<!-- ONE checkpoint at end verifies the complete flow -->
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Complete authentication flow - dev server running at http://localhost:3000</what-built>
<how-to-verify>
1. Visit: http://localhost:3000/login
2. Click "Sign in with GitHub"
3. Complete GitHub OAuth flow
4. Verify: Redirected to /dashboard, user name displayed
5. Refresh page: Session persists
6. Click logout: Session cleared
</how-to-verify>
<resume-signal>Type "approved" or describe issues</resume-signal>
</task>
```
</examples>
<anti_patterns>
### ❌ BAD: Asking user to start dev server
```xml
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Dashboard component</what-built>
<how-to-verify>
1. Run: npm run dev
2. Visit: http://localhost:3000/dashboard
3. Check layout is correct
</how-to-verify>
</task>
```
**Why bad:** the agent can run `npm run dev`. User should only visit URLs, not execute commands.
### ✅ GOOD: the agent starts server, user visits
```xml
<task type="auto">
<name>Start dev server</name>
<action>Run `npm run dev` in background</action>
<verify>fetch http://localhost:3000 returns 200</verify>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>Dashboard at http://localhost:3000/dashboard (server running)</what-built>
<how-to-verify>
Visit http://localhost:3000/dashboard and verify:
1. Layout matches design
2. No console errors
</how-to-verify>
</task>
```
### ❌ BAD: Asking human to deploy / ✅ GOOD: the agent automates
```xml
<!-- BAD: Asking user to deploy via dashboard -->
<task type="checkpoint:human-action" gate="blocking">
<action>Deploy to Vercel</action>
<instructions>Visit vercel.com/new → Import repo → Click Deploy → Copy URL</instructions>
</task>
<!-- GOOD: the agent deploys, user verifies -->
<task type="auto">
<name>Deploy to Vercel</name>
<action>Run `vercel --yes`. Capture URL.</action>
<verify>vercel ls shows deployment, fetch returns 200</verify>
</task>
<task type="checkpoint:human-verify">
<what-built>Deployed to {url}</what-built>
<how-to-verify>Visit {url}, check homepage loads</how-to-verify>
<resume-signal>Type "approved"</resume-signal>
</task>
```
### ❌ BAD: Too many checkpoints / ✅ GOOD: Single checkpoint
```xml
<!-- BAD: Checkpoint after every task -->
<task type="auto">Create schema</task>
<task type="checkpoint:human-verify">Check schema</task>
<task type="auto">Create API route</task>
<task type="checkpoint:human-verify">Check API</task>
<task type="auto">Create UI form</task>
<task type="checkpoint:human-verify">Check form</task>
<!-- GOOD: One checkpoint at end -->
<task type="auto">Create schema</task>
<task type="auto">Create API route</task>
<task type="auto">Create UI form</task>
<task type="checkpoint:human-verify">
<what-built>Complete auth flow (schema + API + UI)</what-built>
<how-to-verify>Test full flow: register, login, access protected page</how-to-verify>
<resume-signal>Type "approved"</resume-signal>
</task>
```
### ❌ BAD: Vague verification / ✅ GOOD: Specific steps
```xml
<!-- BAD -->
<task type="checkpoint:human-verify">
<what-built>Dashboard</what-built>
<how-to-verify>Check it works</how-to-verify>
</task>
<!-- GOOD -->
<task type="checkpoint:human-verify">
<what-built>Responsive dashboard - server running at http://localhost:3000</what-built>
<how-to-verify>
Visit http://localhost:3000/dashboard and verify:
1. Desktop (>1024px): Sidebar visible, content area fills remaining space
2. Tablet (768px): Sidebar collapses to icons
3. Mobile (375px): Sidebar hidden, hamburger menu in header
4. No horizontal scroll at any size
</how-to-verify>
<resume-signal>Type "approved" or describe layout issues</resume-signal>
</task>
```
### ❌ BAD: Asking user to run CLI commands
```xml
<task type="checkpoint:human-action">
<action>Run database migrations</action>
<instructions>Run: npx prisma migrate deploy && npx prisma db seed</instructions>
</task>
```
**Why bad:** the agent can run these commands. User should never execute CLI commands.
### ❌ BAD: Asking user to copy values between services
```xml
<task type="checkpoint:human-action">
<action>Configure webhook URL in Stripe</action>
<instructions>Copy deployment URL → Stripe Dashboard → Webhooks → Add endpoint → Copy secret → Add to .env</instructions>
</task>
```
**Why bad:** Stripe has an API. the agent should create the webhook via API and write to .env directly.
</anti_patterns>
<summary>
Checkpoints formalize human-in-the-loop points for verification and decisions, not manual work.
**The golden rule:** If the agent CAN automate it, the agent MUST automate it.
**Checkpoint priority:**
1. **checkpoint:human-verify** (90%) - the agent automated everything, human confirms visual/functional correctness
2. **checkpoint:decision** (9%) - Human makes architectural/technology choices
3. **checkpoint:human-action** (1%) - Truly unavoidable manual steps with no API/CLI
**When NOT to use checkpoints:**
- Things the agent can verify programmatically (tests, builds)
- File operations (the agent can read files)
- Code correctness (tests and static analysis)
- Anything automatable via CLI/API
</summary>

View File

@@ -0,0 +1,249 @@
# Continuation Format
Standard format for presenting next steps after completing a command or workflow.
## Core Structure
```
---
## ▶ Next Up
**{identifier}: {name}** - {one-line description}
`{command to copy-paste}`
<sub>`/new` first → fresh context window</sub>
---
**Also available:**
- `{alternative option 1}` - description
- `{alternative option 2}` - description
---
```
## Format Rules
1. **Always show what it is** - name + description, never just a command path
2. **Pull context from source** - ROADMAP.md for phases, PLAN.md `<objective>` for plans
3. **Command in inline code** - backticks, easy to copy-paste, renders as clickable link
4. **`/new` explanation** - always include, keeps it concise but explains why
5. **"Also available" not "Other options"** - sounds more app-like
6. **Visual separators** - `---` above and below to make it stand out
## Variants
### Execute Next Plan
```
---
## ▶ Next Up
**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry
`/gsd-execute-phase 2`
<sub>`/new` first → fresh context window</sub>
---
**Also available:**
- Review plan before executing
- `/gsd-list-phase-assumptions 2` - check assumptions
---
```
### Execute Final Plan in Phase
Add note that this is the last plan and what comes after:
```
---
## ▶ Next Up
**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry
<sub>Final plan in Phase 2</sub>
`/gsd-execute-phase 2`
<sub>`/new` first → fresh context window</sub>
---
**After this completes:**
- Phase 2 → Phase 3 transition
- Next: **Phase 3: Core Features** - User dashboard and settings
---
```
### Plan a Phase
```
---
## ▶ Next Up
**Phase 2: Authentication** - JWT login flow with refresh tokens
`/gsd-plan-phase 2`
<sub>`/new` first → fresh context window</sub>
---
**Also available:**
- `/gsd-discuss-phase 2` - gather context first
- `/gsd-research-phase 2` - investigate unknowns
- Review roadmap
---
```
### Phase Complete, Ready for Next
Show completion status before next action:
```
---
## ✓ Phase 2 Complete
3/3 plans executed
## ▶ Next Up
**Phase 3: Core Features** - User dashboard, settings, and data export
`/gsd-plan-phase 3`
<sub>`/new` first → fresh context window</sub>
---
**Also available:**
- `/gsd-discuss-phase 3` - gather context first
- `/gsd-research-phase 3` - investigate unknowns
- Review what Phase 2 built
---
```
### Multiple Equal Options
When there's no clear primary action:
```
---
## ▶ Next Up
**Phase 3: Core Features** - User dashboard, settings, and data export
**To plan directly:** `/gsd-plan-phase 3`
**To discuss context first:** `/gsd-discuss-phase 3`
**To research unknowns:** `/gsd-research-phase 3`
<sub>`/new` first → fresh context window</sub>
---
```
### Milestone Complete
```
---
## 🎉 Milestone v1.0 Complete
All 4 phases shipped
## ▶ Next Up
**Start v1.1** - questioning → research → requirements → roadmap
`/gsd-new-milestone`
<sub>`/new` first → fresh context window</sub>
---
```
## Pulling Context
### For phases (from ROADMAP.md):
```markdown
### Phase 2: Authentication
**Goal**: JWT login flow with refresh tokens
```
Extract: `**Phase 2: Authentication** - JWT login flow with refresh tokens`
### For plans (from ROADMAP.md):
```markdown
Plans:
- [ ] 02-03: Add refresh token rotation
```
Or from PLAN.md `<objective>`:
```xml
<objective>
Add refresh token rotation with sliding expiry window.
Purpose: Extend session lifetime without compromising security.
</objective>
```
Extract: `**02-03: Refresh Token Rotation** - Add /api/auth/refresh with sliding expiry`
## Anti-Patterns
### Don't: Command-only (no context)
```
## To Continue
Run `/new`, then paste:
/gsd-execute-phase 2
```
User has no idea what 02-03 is about.
### Don't: Missing /new explanation
```
`/gsd-plan-phase 3`
Run /new first.
```
Doesn't explain why. User might skip it.
### Don't: "Other options" language
```
Other options:
- Review roadmap
```
Sounds like an afterthought. Use "Also available:" instead.
### Don't: Fenced code blocks for commands
```
```
/gsd-plan-phase 3
```
```
Fenced blocks inside templates create nesting ambiguity. Use inline backticks instead.

View File

@@ -0,0 +1,64 @@
# Decimal Phase Calculation
Calculate the next decimal phase number for urgent insertions.
## Using gsd-tools
```bash
# Get next decimal phase after phase 6
node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal 6
```
Output:
```json
{
"found": true,
"base_phase": "06",
"next": "06.1",
"existing": []
}
```
With existing decimals:
```json
{
"found": true,
"base_phase": "06",
"next": "06.3",
"existing": ["06.1", "06.2"]
}
```
## Extract Values
```bash
DECIMAL_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --pick next)
BASE_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --pick base_phase)
```
Or with --raw flag:
```bash
DECIMAL_PHASE=$(node ".pi/gsd/bin/gsd-tools.cjs" phase next-decimal "${AFTER_PHASE}" --raw)
# Returns just: 06.1
```
## Examples
| Existing Phases | Next Phase |
|-----------------|------------|
| 06 only | 06.1 |
| 06, 06.1 | 06.2 |
| 06, 06.1, 06.2 | 06.3 |
| 06, 06.1, 06.3 (gap) | 06.4 |
## Directory Naming
Decimal phase directories use the full decimal number:
```bash
SLUG=$(node ".pi/gsd/bin/gsd-tools.cjs" generate-slug "$DESCRIPTION" --raw)
PHASE_DIR=".planning/phases/${DECIMAL_PHASE}-${SLUG}"
mkdir -p "$PHASE_DIR"
```
Example: `.planning/phases/06.1-fix-critical-auth-bug/`

View File

@@ -0,0 +1,295 @@
<overview>
Git integration for GSD framework.
</overview>
<core_principle>
**Commit outcomes, not process.**
The git log should read like a changelog of what shipped, not a diary of planning activity.
</core_principle>
<commit_points>
| Event | Commit? | Why |
| ----------------------- | ------- | ------------------------------------------- |
| BRIEF + ROADMAP created | YES | Project initialization |
| PLAN.md created | NO | Intermediate - commit with plan completion |
| RESEARCH.md created | NO | Intermediate |
| DISCOVERY.md created | NO | Intermediate |
| **Task completed** | YES | Atomic unit of work (1 commit per task) |
| **Plan completed** | YES | Metadata commit (SUMMARY + STATE + ROADMAP) |
| Handoff created | YES | WIP state preserved |
</commit_points>
<git_check>
```bash
[ -d .git ] && echo "GIT_EXISTS" || echo "NO_GIT"
```
If NO_GIT: Run `git init` silently. GSD projects always get their own repo.
</git_check>
<commit_formats>
<format name="initialization">
## Project Initialization (brief + roadmap together)
```
docs: initialize [project-name] ([N] phases)
[One-liner from PROJECT.md]
Phases:
1. [phase-name]: [goal]
2. [phase-name]: [goal]
3. [phase-name]: [goal]
```
What to commit:
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: initialize [project-name] ([N] phases)" --files .planning/
```
</format>
<format name="task-completion">
## Task Completion (During Plan Execution)
Each task gets its own commit immediately after completion.
> **Parallel agents:** When running as a parallel executor (spawned by execute-phase),
> use `--no-verify` on all commits to avoid pre-commit hook lock contention.
> The orchestrator validates hooks once after all agents complete.
```
{type}({phase}-{plan}): {task-name}
- [Key change 1]
- [Key change 2]
- [Key change 3]
```
**Commit types:**
- `feat` - New feature/functionality
- `fix` - Bug fix
- `test` - Test-only (TDD RED phase)
- `refactor` - Code cleanup (TDD REFACTOR phase)
- `perf` - Performance improvement
- `chore` - Dependencies, config, tooling
**Examples:**
```bash
# Standard task
git add src/api/auth.ts src/types/user.ts
git commit -m "feat(08-02): create user registration endpoint
- POST /auth/register validates email and password
- Checks for duplicate users
- Returns JWT token on success
"
# TDD task - RED phase
git add src/__tests__/jwt.test.ts
git commit -m "test(07-02): add failing test for JWT generation
- Tests token contains user ID claim
- Tests token expires in 1 hour
- Tests signature verification
"
# TDD task - GREEN phase
git add src/utils/jwt.ts
git commit -m "feat(07-02): implement JWT generation
- Uses jose library for signing
- Includes user ID and expiry claims
- Signs with HS256 algorithm
"
```
</format>
<format name="plan-completion">
## Plan Completion (After All Tasks Done)
After all tasks committed, one final metadata commit captures plan completion.
```
docs({phase}-{plan}): complete [plan-name] plan
Tasks completed: [N]/[N]
- [Task 1 name]
- [Task 2 name]
- [Task 3 name]
SUMMARY: .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md
```
What to commit:
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "docs({phase}-{plan}): complete [plan-name] plan" --files .planning/phases/XX-name/{phase}-{plan}-PLAN.md .planning/phases/XX-name/{phase}-{plan}-SUMMARY.md .planning/STATE.md .planning/ROADMAP.md
```
**Note:** Code files NOT included - already committed per-task.
</format>
<format name="handoff">
## Handoff (WIP)
```
wip: [phase-name] paused at task [X]/[Y]
Current: [task name]
[If blocked:] Blocked: [reason]
```
What to commit:
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "wip: [phase-name] paused at task [X]/[Y]" --files .planning/
```
</format>
</commit_formats>
<example_log>
**Old approach (per-plan commits):**
```
a7f2d1 feat(checkout): Stripe payments with webhook verification
3e9c4b feat(products): catalog with search, filters, and pagination
8a1b2c feat(auth): JWT with refresh rotation using jose
5c3d7e feat(foundation): Next.js 15 + Prisma + Tailwind scaffold
2f4a8d docs: initialize ecommerce-app (5 phases)
```
**New approach (per-task commits):**
```
# Phase 04 - Checkout
1a2b3c docs(04-01): complete checkout flow plan
4d5e6f feat(04-01): add webhook signature verification
7g8h9i feat(04-01): implement payment session creation
0j1k2l feat(04-01): create checkout page component
# Phase 03 - Products
3m4n5o docs(03-02): complete product listing plan
6p7q8r feat(03-02): add pagination controls
9s0t1u feat(03-02): implement search and filters
2v3w4x feat(03-01): create product catalog schema
# Phase 02 - Auth
5y6z7a docs(02-02): complete token refresh plan
8b9c0d feat(02-02): implement refresh token rotation
1e2f3g test(02-02): add failing test for token refresh
4h5i6j docs(02-01): complete JWT setup plan
7k8l9m feat(02-01): add JWT generation and validation
0n1o2p chore(02-01): install jose library
# Phase 01 - Foundation
3q4r5s docs(01-01): complete scaffold plan
6t7u8v feat(01-01): configure Tailwind and globals
9w0x1y feat(01-01): set up Prisma with database
2z3a4b feat(01-01): create Next.js 15 project
# Initialization
5c6d7e docs: initialize ecommerce-app (5 phases)
```
Each plan produces 2-4 commits (tasks + metadata). Clear, granular, bisectable.
</example_log>
<anti_patterns>
**Still don't commit (intermediate artifacts):**
- PLAN.md creation (commit with plan completion)
- RESEARCH.md (intermediate)
- DISCOVERY.md (intermediate)
- Minor planning tweaks
- "Fixed typo in roadmap"
**Do commit (outcomes):**
- Each task completion (feat/fix/test/refactor)
- Plan completion metadata (docs)
- Project initialization (docs)
**Key principle:** Commit working code and shipped outcomes, not planning process.
</anti_patterns>
<commit_strategy_rationale>
## Why Per-Task Commits?
**Context engineering for AI:**
- Git history becomes primary context source for future the agent sessions
- `git log --grep="{phase}-{plan}"` shows all work for a plan
- `git diff <hash>^..<hash>` shows exact changes per task
- Less reliance on parsing SUMMARY.md = more context for actual work
**Failure recovery:**
- Task 1 committed ✅, Task 2 failed ❌
- the agent in next session: sees task 1 complete, can retry task 2
- Can `git reset --hard` to last successful task
**Debugging:**
- `git bisect` finds exact failing task, not just failing plan
- `git blame` traces line to specific task context
- Each commit is independently revertable
**Observability:**
- Solo developer + the agent workflow benefits from granular attribution
- Atomic commits are git best practice
- "Commit noise" irrelevant when consumer is the agent, not humans
</commit_strategy_rationale>
<sub_repos_support>
## Multi-Repo Workspace Support (sub_repos)
For workspaces with separate git repos (e.g., `backend/`, `frontend/`, `shared/`), GSD routes commits to each repo independently.
### Configuration
In `.planning/config.json`, list sub-repo directories under `planning.sub_repos`:
```json
{
"planning": {
"commit_docs": false,
"sub_repos": ["backend", "frontend", "shared"]
}
}
```
Set `commit_docs: false` so planning docs stay local and are not committed to any sub-repo.
### How It Works
1. **Auto-detection:** During `/gsd-new-project`, directories with their own `.git` folder are detected and offered for selection as sub-repos. On subsequent runs, `loadConfig` auto-syncs the `sub_repos` list with the filesystem - adding newly created repos and removing deleted ones. This means `config.json` may be rewritten automatically when repos change on disk.
2. **File grouping:** Code files are grouped by their sub-repo prefix (e.g., `backend/src/api/users.ts` belongs to the `backend/` repo).
3. **Independent commits:** Each sub-repo receives its own atomic commit via `gsd-tools.cjs commit-to-subrepo`. File paths are made relative to the sub-repo root before staging.
4. **Planning stays local:** The `.planning/` directory is not committed; it acts as cross-repo coordination.
### Commit Routing
Instead of the standard `commit` command, use `commit-to-subrepo` when `sub_repos` is configured:
```bash
node .pi/gsd/bin/gsd-tools.cjs commit-to-subrepo "feat(02-01): add user API" \
--files backend/src/api/users.ts backend/src/types/user.ts frontend/src/components/UserForm.tsx
```
This stages `src/api/users.ts` and `src/types/user.ts` in the `backend/` repo, and `src/components/UserForm.tsx` in the `frontend/` repo, then commits each independently with the same message.
Files that don't match any configured sub-repo are reported as unmatched.
</sub_repos_support>

View File

@@ -0,0 +1,38 @@
# Git Planning Commit
Commit planning artifacts using the gsd-tools CLI, which automatically checks `commit_docs` config and gitignore status.
## Commit via CLI
Always use `gsd-tools.cjs commit` for `.planning/` files - it handles `commit_docs` and gitignore checks automatically:
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "docs({scope}): {description}" --files .planning/STATE.md .planning/ROADMAP.md
```
The CLI will return `skipped` (with reason) if `commit_docs` is `false` or `.planning/` is gitignored. No manual conditional checks needed.
## Amend previous commit
To fold `.planning/` file changes into the previous commit:
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "" --files .planning/codebase/*.md --amend
```
## Commit Message Patterns
| Command | Scope | Example |
| ------------- | --------- | ----------------------------------------------- |
| plan-phase | phase | `docs(phase-03): create authentication plans` |
| execute-phase | phase | `docs(phase-03): complete authentication phase` |
| new-milestone | milestone | `docs: start milestone v1.1` |
| remove-phase | chore | `chore: remove phase 17 (dashboard)` |
| insert-phase | phase | `docs: insert phase 16.1 (critical fix)` |
| add-phase | phase | `docs: add phase 07 (settings page)` |
## When to Skip
- `commit_docs: false` in config
- `.planning/` is gitignored
- No changes to commit (check with `git status --porcelain .planning/`)

View File

@@ -0,0 +1,36 @@
# Model Profile Resolution
Resolve model profile once at the start of orchestration, then use it for all Task spawns.
## Resolution Pattern
```bash
MODEL_PROFILE=$(cat .planning/config.json 2>/dev/null | grep -o '"model_profile"[[:space:]]*:[[:space:]]*"[^"]*"' | grep -o '"[^"]*"$' | tr -d '"' || echo "balanced")
```
Default: `balanced` if not set or config missing.
## Lookup Table
@.pi/gsd/references/model-profiles.md
Look up the agent in the table for the resolved profile. Pass the model parameter to Task calls:
```
Task(
prompt="...",
subagent_type="gsd-planner",
model="{resolved_model}" # "inherit", "sonnet", or "haiku"
)
```
**Note:** Opus-tier agents resolve to `"inherit"` (not `"opus"`). This causes the agent to use the parent session's model, avoiding conflicts with organization policies that may block specific opus versions.
If `model_profile` is `"inherit"`, all agents resolve to `"inherit"` (useful for OpenCode `/model`).
## Usage
1. Resolve once at orchestration start
2. Store the profile value
3. Look up each agent's model from the table when spawning
4. Pass model parameter to each Task call (values: `"inherit"`, `"sonnet"`, `"haiku"`)

View File

@@ -0,0 +1,146 @@
<!-- AUTO-GENERATED - do not edit by hand.
Source of truth: get-shit-done/bin/lib/model-profiles.cjs
Regenerate with: node get-shit-done/bin/gsd-tools.cjs generate-model-profiles-md --harness agent
-->
# Model Profiles
Model profiles control which Claude model each GSD agent uses. This allows balancing quality vs token spend, or inheriting the currently selected session model.
## Profile Definitions
| Agent | `quality` | `balanced` | `budget` | `inherit` |
|-------|-----------|------------|----------|-----------|
| gsd-planner | opus | opus | sonnet | inherit |
| gsd-roadmapper | opus | sonnet | sonnet | inherit |
| gsd-executor | opus | sonnet | sonnet | inherit |
| gsd-phase-researcher | opus | sonnet | haiku | inherit |
| gsd-project-researcher | opus | sonnet | haiku | inherit |
| gsd-research-synthesizer | sonnet | sonnet | haiku | inherit |
| gsd-debugger | opus | sonnet | sonnet | inherit |
| gsd-codebase-mapper | sonnet | haiku | haiku | inherit |
| gsd-verifier | sonnet | sonnet | haiku | inherit |
| gsd-plan-checker | sonnet | sonnet | haiku | inherit |
| gsd-integration-checker | sonnet | sonnet | haiku | inherit |
| gsd-nyquist-auditor | sonnet | sonnet | haiku | inherit |
| gsd-ui-researcher | opus | sonnet | haiku | inherit |
| gsd-ui-checker | sonnet | sonnet | haiku | inherit |
| gsd-ui-auditor | sonnet | sonnet | haiku | inherit |
## Profile Philosophy
**quality** - Maximum reasoning power
- Opus for all decision-making agents
- Sonnet for read-only verification
- Use when: quota available, critical architecture work
**balanced** (default) - Smart allocation
- Opus only for planning (where architecture decisions happen)
- Sonnet for execution and research (follows explicit instructions)
- Sonnet for verification (needs reasoning, not just pattern matching)
- Use when: normal development, good balance of quality and cost
**budget** - Minimal Opus usage
- Sonnet for anything that writes code
- Haiku for research and verification
- Use when: conserving quota, high-volume work, less critical phases
**inherit** - Follow the current session model
- All agents resolve to `inherit`
- Best when you switch models interactively (for example OpenCode `/model`)
- **Required when using non-Anthropic providers** (OpenRouter, local models, etc.) - otherwise GSD may call Anthropic models directly, incurring unexpected costs
- Use when: you want GSD to follow your currently selected runtime model
## Using Non-Claude Runtimes (Codex, OpenCode, Gemini CLI)
When installed for a non-Claude runtime, the GSD installer sets `resolve_model_ids: "omit"` in `~/.gsd/defaults.json`. This returns an empty model parameter for all agents, so each agent uses the runtime's default model. No manual setup is needed.
To assign different models to different agents, add `model_overrides` with model IDs your runtime recognizes:
```json
{
"resolve_model_ids": "omit",
"model_overrides": {
"gsd-planner": "o3",
"gsd-executor": "o4-mini",
"gsd-debugger": "o3",
"gsd-codebase-mapper": "o4-mini"
}
}
```
The same tiering logic applies: stronger models for planning and debugging, cheaper models for execution and mapping.
## Using Claude Code with Non-Anthropic Providers (OpenRouter, Local)
If you're using Claude Code with OpenRouter, a local model, or any non-Anthropic provider, set the `inherit` profile to prevent GSD from calling Anthropic models for subagents:
```bash
# Via settings command
/gsd-settings
# → Select "Inherit" for model profile
# Or manually in .planning/config.json
{
"model_profile": "inherit"
}
```
Without `inherit`, GSD's default `balanced` profile spawns specific Anthropic models (`opus`, `sonnet`, `haiku`) for each agent type, which can result in additional API costs through your non-Anthropic provider.
## Resolution Logic
Orchestrators resolve model before spawning:
```
1. Read .planning/config.json
2. Check model_overrides for agent-specific override
3. If no override, look up agent in profile table
4. Pass model parameter to Task call
```
## Per-Agent Overrides
Override specific agents without changing the entire profile:
```json
{
"model_profile": "balanced",
"model_overrides": {
"gsd-executor": "opus",
"gsd-planner": "haiku"
}
}
```
Overrides take precedence over the profile. Valid values: `opus`, `sonnet`, `haiku`, `inherit`, or any fully-qualified model ID (e.g., `"o3"`, `"openai/o3"`, `"google/gemini-2.5-pro"`).
## Switching Profiles
Runtime: `/gsd-set-profile <profile>`
Per-project default: Set in `.planning/config.json`:
```json
{
"model_profile": "balanced"
}
```
## Design Rationale
**Why Opus for gsd-planner?**
Planning involves architecture decisions, goal decomposition, and task design. This is where model quality has the highest impact.
**Why Sonnet for gsd-executor?**
Executors follow explicit PLAN.md instructions. The plan already contains the reasoning; execution is implementation.
**Why Sonnet (not Haiku) for verifiers in balanced?**
Verification requires goal-backward reasoning - checking if code *delivers* what the phase promised, not just pattern matching. Sonnet handles this well; Haiku may miss subtle gaps.
**Why Haiku for gsd-codebase-mapper?**
Read-only exploration and pattern extraction. No reasoning required, just structured output from file contents.
**Why `inherit` instead of passing `opus` directly?**
Claude Code's `"opus"` alias maps to a specific model version. Organizations may block older opus versions while allowing newer ones. GSD returns `"inherit"` for opus-tier agents, causing them to use whatever opus version the user has configured in their session. This avoids version conflicts and silent fallbacks to Sonnet.
**Why `inherit` profile?**
Some runtimes (including OpenCode) let users switch models at runtime (`/model`). The `inherit` profile keeps all GSD subagents aligned to that live selection.

View File

@@ -0,0 +1,61 @@
# Phase Argument Parsing
Parse and normalize phase arguments for commands that operate on phases.
## Extraction
From `$ARGUMENTS`:
- Extract phase number (first numeric argument)
- Extract flags (prefixed with `--`)
- Remaining text is description (for insert/add commands)
## Using gsd-tools
The `find-phase` command handles normalization and validation in one step:
```bash
PHASE_INFO=$(node ".pi/gsd/bin/gsd-tools.cjs" find-phase "${PHASE}")
```
Returns JSON with:
- `found`: true/false
- `directory`: Full path to phase directory
- `phase_number`: Normalized number (e.g., "06", "06.1")
- `phase_name`: Name portion (e.g., "foundation")
- `plans`: Array of PLAN.md files
- `summaries`: Array of SUMMARY.md files
## Manual Normalization (Legacy)
Zero-pad integer phases to 2 digits. Preserve decimal suffixes.
```bash
# Normalize phase number
if [[ "$PHASE" =~ ^[0-9]+$ ]]; then
# Integer: 8 → 08
PHASE=$(printf "%02d" "$PHASE")
elif [[ "$PHASE" =~ ^([0-9]+)\.([0-9]+)$ ]]; then
# Decimal: 2.1 → 02.1
PHASE=$(printf "%02d.%s" "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}")
fi
```
## Validation
Use `roadmap get-phase` to validate phase exists:
```bash
PHASE_CHECK=$(node ".pi/gsd/bin/gsd-tools.cjs" roadmap get-phase "${PHASE}" --pick found)
if [ "$PHASE_CHECK" = "false" ]; then
echo "ERROR: Phase ${PHASE} not found in roadmap"
exit 1
fi
```
## Directory Lookup
Use `find-phase` for directory lookup:
```bash
PHASE_DIR=$(node ".pi/gsd/bin/gsd-tools.cjs" find-phase "${PHASE}" --raw)
```

View File

@@ -0,0 +1,202 @@
<planning_config>
Configuration options for `.planning/` directory behavior.
<config_schema>
```json
"planning": {
"commit_docs": true,
"search_gitignored": false
},
"git": {
"branching_strategy": "none",
"phase_branch_template": "gsd/phase-{phase}-{slug}",
"milestone_branch_template": "gsd/{milestone}-{slug}",
"quick_branch_template": null
}
```
| Option | Default | Description |
| ------------------------------- | ---------------------------- | ------------------------------------------------------------- |
| `commit_docs` | `true` | Whether to commit planning artifacts to git |
| `search_gitignored` | `false` | Add `--no-ignore` to broad rg searches |
| `git.branching_strategy` | `"none"` | Git branching approach: `"none"`, `"phase"`, or `"milestone"` |
| `git.phase_branch_template` | `"gsd/phase-{phase}-{slug}"` | Branch template for phase strategy |
| `git.milestone_branch_template` | `"gsd/{milestone}-{slug}"` | Branch template for milestone strategy |
| `git.quick_branch_template` | `null` | Optional branch template for quick-task runs |
</config_schema>
<commit_docs_behavior>
**When `commit_docs: true` (default):**
- Planning files committed normally
- SUMMARY.md, STATE.md, ROADMAP.md tracked in git
- Full history of planning decisions preserved
**When `commit_docs: false`:**
- Skip all `git add`/`git commit` for `.planning/` files
- User must add `.planning/` to `.gitignore`
- Useful for: OSS contributions, client projects, keeping planning private
**Using gsd-tools.cjs (preferred):**
```bash
# Commit with automatic commit_docs + gitignore checks:
node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: update state" --files .planning/STATE.md
# Load config via state load (returns JSON):
INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" state load)
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
# commit_docs is available in the JSON output
# Or use init commands which include commit_docs:
INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" init execute-phase "1")
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
# commit_docs is included in all init command outputs
```
**Auto-detection:** If `.planning/` is gitignored, `commit_docs` is automatically `false` regardless of config.json. This prevents git errors when users have `.planning/` in `.gitignore`.
**Commit via CLI (handles checks automatically):**
```bash
node ".pi/gsd/bin/gsd-tools.cjs" commit "docs: update state" --files .planning/STATE.md
```
The CLI checks `commit_docs` config and gitignore status internally - no manual conditionals needed.
</commit_docs_behavior>
<search_behavior>
**When `search_gitignored: false` (default):**
- Standard rg behavior (respects .gitignore)
- Direct path searches work: `rg "pattern" .planning/` finds files
- Broad searches skip gitignored: `rg "pattern"` skips `.planning/`
**When `search_gitignored: true`:**
- Add `--no-ignore` to broad rg searches that should include `.planning/`
- Only needed when searching entire repo and expecting `.planning/` matches
**Note:** Most GSD operations use direct file reads or explicit paths, which work regardless of gitignore status.
</search_behavior>
<setup_uncommitted_mode>
To use uncommitted mode:
1. **Set config:**
```json
"planning": {
"commit_docs": false,
"search_gitignored": true
}
```
2. **Add to .gitignore:**
```
.planning/
```
3. **Existing tracked files:** If `.planning/` was previously tracked:
```bash
git rm -r --cached .planning/
git commit -m "chore: stop tracking planning docs"
```
4. **Branch merges:** When using `branching_strategy: phase` or `milestone`, the `complete-milestone` workflow automatically strips `.planning/` files from staging before merge commits when `commit_docs: false`.
</setup_uncommitted_mode>
<branching_strategy_behavior>
**Branching Strategies:**
| Strategy | When branch created | Branch scope | Merge point |
| ----------- | ------------------------------------- | ---------------- | ----------------------- |
| `none` | Never | N/A | N/A |
| `phase` | At `execute-phase` start | Single phase | User merges after phase |
| `milestone` | At first `execute-phase` of milestone | Entire milestone | At `complete-milestone` |
**When `git.branching_strategy: "none"` (default):**
- All work commits to current branch
- Standard GSD behavior
**When `git.branching_strategy: "phase"`:**
- `execute-phase` creates/switches to a branch before execution
- Branch name from `phase_branch_template` (e.g., `gsd/phase-03-authentication`)
- All plan commits go to that branch
- User merges branches manually after phase completion
- `complete-milestone` offers to merge all phase branches
**When `git.branching_strategy: "milestone"`:**
- First `execute-phase` of milestone creates the milestone branch
- Branch name from `milestone_branch_template` (e.g., `gsd/v1.0-mvp`)
- All phases in milestone commit to same branch
- `complete-milestone` offers to merge milestone branch to main
**Template variables:**
| Variable | Available in | Description |
| ------------- | ------------------------- | ------------------------------------- |
| `{phase}` | phase_branch_template | Zero-padded phase number (e.g., "03") |
| `{slug}` | Both | Lowercase, hyphenated name |
| `{milestone}` | milestone_branch_template | Milestone version (e.g., "v1.0") |
**Checking the config:**
Use `init execute-phase` which returns all config as JSON:
```bash
INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" init execute-phase "1")
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
# JSON output includes: branching_strategy, phase_branch_template, milestone_branch_template
```
Or use `state load` for the config values:
```bash
INIT=$(node ".pi/gsd/bin/gsd-tools.cjs" state load)
if [[ "$INIT" == @file:* ]]; then INIT=$(cat "${INIT#@file:}"); fi
# Parse branching_strategy, phase_branch_template, milestone_branch_template from JSON
```
**Branch creation:**
```bash
# For phase strategy
if [ "$BRANCHING_STRATEGY" = "phase" ]; then
PHASE_SLUG=$(echo "$PHASE_NAME" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//')
BRANCH_NAME=$(echo "$PHASE_BRANCH_TEMPLATE" | sed "s/{phase}/$PADDED_PHASE/g" | sed "s/{slug}/$PHASE_SLUG/g")
git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
fi
# For milestone strategy
if [ "$BRANCHING_STRATEGY" = "milestone" ]; then
MILESTONE_SLUG=$(echo "$MILESTONE_NAME" | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//')
BRANCH_NAME=$(echo "$MILESTONE_BRANCH_TEMPLATE" | sed "s/{milestone}/$MILESTONE_VERSION/g" | sed "s/{slug}/$MILESTONE_SLUG/g")
git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
fi
```
**Merge options at complete-milestone:**
| Option | Git command | Result |
| -------------------------- | -------------------- | -------------------------------- |
| Squash merge (recommended) | `git merge --squash` | Single clean commit per branch |
| Merge with history | `git merge --no-ff` | Preserves all individual commits |
| Delete without merging | `git branch -D` | Discard branch work |
| Keep branches | (none) | Manual handling later |
Squash merge is recommended - keeps main branch history clean while preserving the full development history in the branch (until deleted).
**Use cases:**
| Strategy | Best for |
| ----------- | ------------------------------------------------------------ |
| `none` | Solo development, simple projects |
| `phase` | Code review per phase, granular rollback, team collaboration |
| `milestone` | Release branches, staging environments, PR per version |
</branching_strategy_behavior>
</planning_config>

View File

@@ -0,0 +1,162 @@
<questioning_guide>
Project initialization is dream extraction, not requirements gathering. You're helping the user discover and articulate what they want to build. This isn't a contract negotiation - it's collaborative thinking.
<philosophy>
**You are a thinking partner, not an interviewer.**
The user often has a fuzzy idea. Your job is to help them sharpen it. Ask questions that make them think "oh, I hadn't considered that" or "yes, that's exactly what I mean."
Don't interrogate. Collaborate. Don't follow a script. Follow the thread.
</philosophy>
<the_goal>
By the end of questioning, you need enough clarity to write a PROJECT.md that downstream phases can act on:
- **Research** needs: what domain to research, what the user already knows, what unknowns exist
- **Requirements** needs: clear enough vision to scope v1 features
- **Roadmap** needs: clear enough vision to decompose into phases, what "done" looks like
- **plan-phase** needs: specific requirements to break into tasks, context for implementation choices
- **execute-phase** needs: success criteria to verify against, the "why" behind requirements
A vague PROJECT.md forces every downstream phase to guess. The cost compounds.
</the_goal>
<how_to_question>
**Start open.** Let them dump their mental model. Don't interrupt with structure.
**Follow energy.** Whatever they emphasized, dig into that. What excited them? What problem sparked this?
**Challenge vagueness.** Never accept fuzzy answers. "Good" means what? "Users" means who? "Simple" means how?
**Make the abstract concrete.** "Walk me through using this." "What does that actually look like?"
**Clarify ambiguity.** "When you say Z, do you mean A or B?" "You mentioned X - tell me more."
**Know when to stop.** When you understand what they want, why they want it, who it's for, and what done looks like - offer to proceed.
</how_to_question>
<question_types>
Use these as inspiration, not a checklist. Pick what's relevant to the thread.
**Motivation - why this exists:**
- "What prompted this?"
- "What are you doing today that this replaces?"
- "What would you do if this existed?"
**Concreteness - what it actually is:**
- "Walk me through using this"
- "You said X - what does that actually look like?"
- "Give me an example"
**Clarification - what they mean:**
- "When you say Z, do you mean A or B?"
- "You mentioned X - tell me more about that"
**Success - how you'll know it's working:**
- "How will you know this is working?"
- "What does done look like?"
</question_types>
<using_askuserquestion>
Use AskUserQuestion to help users think by presenting concrete options to react to.
**Good options:**
- Interpretations of what they might mean
- Specific examples to confirm or deny
- Concrete choices that reveal priorities
**Bad options:**
- Generic categories ("Technical", "Business", "Other")
- Leading options that presume an answer
- Too many options (2-4 is ideal)
- Headers longer than 12 characters (hard limit - validation will reject them)
**Example - vague answer:**
User says "it should be fast"
- header: "Fast"
- question: "Fast how?"
- options: ["Sub-second response", "Handles large datasets", "Quick to build", "Let me explain"]
**Example - following a thread:**
User mentions "frustrated with current tools"
- header: "Frustration"
- question: "What specifically frustrates you?"
- options: ["Too many clicks", "Missing features", "Unreliable", "Let me explain"]
**Tip for users - modifying an option:**
Users who want a slightly modified version of an option can select "Other" and reference the option by number: `#1 but for finger joints only` or `#2 with pagination disabled`. This avoids retyping the full option text.
</using_askuserquestion>
<freeform_rule>
**When the user wants to explain freely, STOP using AskUserQuestion.**
If a user selects "Other" and their response signals they want to describe something in their own words (e.g., "let me describe it", "I'll explain", "something else", or any open-ended reply that isn't choosing/modifying an existing option), you MUST:
1. **Ask your follow-up as plain text** - NOT via AskUserQuestion
2. **Wait for them to type at the normal prompt**
3. **Resume AskUserQuestion** only after processing their freeform response
The same applies if YOU include a freeform-indicating option (like "Let me explain" or "Describe in detail") and the user selects it.
**Wrong:** User says "let me describe it" → AskUserQuestion("What feature?", ["Feature A", "Feature B", "Describe in detail"])
**Right:** User says "let me describe it" → "Go ahead - what are you thinking?"
</freeform_rule>
<context_checklist>
Use this as a **background checklist**, not a conversation structure. Check these mentally as you go. If gaps remain, weave questions naturally.
- [ ] What they're building (concrete enough to explain to a stranger)
- [ ] Why it needs to exist (the problem or desire driving it)
- [ ] Who it's for (even if just themselves)
- [ ] What "done" looks like (observable outcomes)
Four things. If they volunteer more, capture it.
</context_checklist>
<decision_gate>
When you could write a clear PROJECT.md, offer to proceed:
- header: "Ready?"
- question: "I think I understand what you're after. Ready to create PROJECT.md?"
- options:
- "Create PROJECT.md" - Let's move forward
- "Keep exploring" - I want to share more / ask me more
If "Keep exploring" - ask what they want to add or identify gaps and probe naturally.
Loop until "Create PROJECT.md" selected.
</decision_gate>
<anti_patterns>
- **Checklist walking** - Going through domains regardless of what they said
- **Canned questions** - "What's your core value?" "What's out of scope?" regardless of context
- **Corporate speak** - "What are your success criteria?" "Who are your stakeholders?"
- **Interrogation** - Firing questions without building on answers
- **Rushing** - Minimizing questions to get to "the work"
- **Shallow acceptance** - Taking vague answers without probing
- **Premature constraints** - Asking about tech stack before understanding the idea
- **User skills** - NEVER ask about user's technical experience. the agent builds.
</anti_patterns>
</questioning_guide>

263
.pi/gsd/references/tdd.md Normal file
View File

@@ -0,0 +1,263 @@
<overview>
TDD is about design quality, not coverage metrics. The red-green-refactor cycle forces you to think about behavior before implementation, producing cleaner interfaces and more testable code.
**Principle:** If you can describe the behavior as `expect(fn(input)).toBe(output)` before writing `fn`, TDD improves the result.
**Key insight:** TDD work is fundamentally heavier than standard tasks-it requires 2-3 execution cycles (RED → GREEN → REFACTOR), each with file reads, test runs, and potential debugging. TDD features get dedicated plans to ensure full context is available throughout the cycle.
</overview>
<when_to_use_tdd>
## When TDD Improves Quality
**TDD candidates (create a TDD plan):**
- Business logic with defined inputs/outputs
- API endpoints with request/response contracts
- Data transformations, parsing, formatting
- Validation rules and constraints
- Algorithms with testable behavior
- State machines and workflows
- Utility functions with clear specifications
**Skip TDD (use standard plan with `type="auto"` tasks):**
- UI layout, styling, visual components
- Configuration changes
- Glue code connecting existing components
- One-off scripts and migrations
- Simple CRUD with no business logic
- Exploratory prototyping
**Heuristic:** Can you write `expect(fn(input)).toBe(output)` before writing `fn`?
→ Yes: Create a TDD plan
→ No: Use standard plan, add tests after if needed
</when_to_use_tdd>
<tdd_plan_structure>
## TDD Plan Structure
Each TDD plan implements **one feature** through the full RED-GREEN-REFACTOR cycle.
```markdown
---
phase: XX-name
plan: NN
type: tdd
---
<objective>
[What feature and why]
Purpose: [Design benefit of TDD for this feature]
Output: [Working, tested feature]
</objective>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@relevant/source/files.ts
</context>
<feature>
<name>[Feature name]</name>
<files>[source file, test file]</files>
<behavior>
[Expected behavior in testable terms]
Cases: input → expected output
</behavior>
<implementation>[How to implement once tests pass]</implementation>
</feature>
<verification>
[Test command that proves feature works]
</verification>
<success_criteria>
- Failing test written and committed
- Implementation passes test
- Refactor complete (if needed)
- All 2-3 commits present
</success_criteria>
<output>
After completion, create SUMMARY.md with:
- RED: What test was written, why it failed
- GREEN: What implementation made it pass
- REFACTOR: What cleanup was done (if any)
- Commits: List of commits produced
</output>
```
**One feature per TDD plan.** If features are trivial enough to batch, they're trivial enough to skip TDD-use a standard plan and add tests after.
</tdd_plan_structure>
<execution_flow>
## Red-Green-Refactor Cycle
**RED - Write failing test:**
1. Create test file following project conventions
2. Write test describing expected behavior (from `<behavior>` element)
3. Run test - it MUST fail
4. If test passes: feature exists or test is wrong. Investigate.
5. Commit: `test({phase}-{plan}): add failing test for [feature]`
**GREEN - Implement to pass:**
1. Write minimal code to make test pass
2. No cleverness, no optimization - just make it work
3. Run test - it MUST pass
4. Commit: `feat({phase}-{plan}): implement [feature]`
**REFACTOR (if needed):**
1. Clean up implementation if obvious improvements exist
2. Run tests - MUST still pass
3. Only commit if changes made: `refactor({phase}-{plan}): clean up [feature]`
**Result:** Each TDD plan produces 2-3 atomic commits.
</execution_flow>
<test_quality>
## Good Tests vs Bad Tests
**Test behavior, not implementation:**
- Good: "returns formatted date string"
- Bad: "calls formatDate helper with correct params"
- Tests should survive refactors
**One concept per test:**
- Good: Separate tests for valid input, empty input, malformed input
- Bad: Single test checking all edge cases with multiple assertions
**Descriptive names:**
- Good: "should reject empty email", "returns null for invalid ID"
- Bad: "test1", "handles error", "works correctly"
**No implementation details:**
- Good: Test public API, observable behavior
- Bad: Mock internals, test private methods, assert on internal state
</test_quality>
<framework_setup>
## Test Framework Setup (If None Exists)
When executing a TDD plan but no test framework is configured, set it up as part of the RED phase:
**1. Detect project type:**
```bash
# JavaScript/TypeScript
if [ -f package.json ]; then echo "node"; fi
# Python
if [ -f requirements.txt ] || [ -f pyproject.toml ]; then echo "python"; fi
# Go
if [ -f go.mod ]; then echo "go"; fi
# Rust
if [ -f Cargo.toml ]; then echo "rust"; fi
```
**2. Install minimal framework:**
| Project | Framework | Install |
| -------------- | ---------- | ----------------------------------------- |
| Node.js | Jest | `npm install -D jest @types/jest ts-jest` |
| Node.js (Vite) | Vitest | `npm install -D vitest` |
| Python | pytest | `pip install pytest` |
| Go | testing | Built-in |
| Rust | cargo test | Built-in |
**3. Create config if needed:**
- Jest: `jest.config.js` with ts-jest preset
- Vitest: `vitest.config.ts` with test globals
- pytest: `pytest.ini` or `pyproject.toml` section
**4. Verify setup:**
```bash
# Run empty test suite - should pass with 0 tests
npm test # Node
pytest # Python
go test ./... # Go
cargo test # Rust
```
**5. Create first test file:**
Follow project conventions for test location:
- `*.test.ts` / `*.spec.ts` next to source
- `__tests__/` directory
- `tests/` directory at root
Framework setup is a one-time cost included in the first TDD plan's RED phase.
</framework_setup>
<error_handling>
## Error Handling
**Test doesn't fail in RED phase:**
- Feature may already exist - investigate
- Test may be wrong (not testing what you think)
- Fix before proceeding
**Test doesn't pass in GREEN phase:**
- Debug implementation
- Don't skip to refactor
- Keep iterating until green
**Tests fail in REFACTOR phase:**
- Undo refactor
- Commit was premature
- Refactor in smaller steps
**Unrelated tests break:**
- Stop and investigate
- May indicate coupling issue
- Fix before proceeding
</error_handling>
<commit_pattern>
## Commit Pattern for TDD Plans
TDD plans produce 2-3 atomic commits (one per phase):
```
test(08-02): add failing test for email validation
- Tests valid email formats accepted
- Tests invalid formats rejected
- Tests empty input handling
feat(08-02): implement email validation
- Regex pattern matches RFC 5322
- Returns boolean for validity
- Handles edge cases (empty, null)
refactor(08-02): extract regex to constant (optional)
- Moved pattern to EMAIL_REGEX constant
- No behavior changes
- Tests still pass
```
**Comparison with standard plans:**
- Standard plans: 1 commit per task, 2-4 commits per plan
- TDD plans: 2-3 commits for single feature
Both follow same format: `{type}({phase}-{plan}): {description}`
**Benefits:**
- Each commit independently revertable
- Git bisect works at commit level
- Clear history showing TDD discipline
- Consistent with overall commit strategy
</commit_pattern>
<context_budget>
## Context Budget
TDD plans target **~40% context usage** (lower than standard plans' ~50%).
Why lower:
- RED phase: write test, run test, potentially debug why it didn't fail
- GREEN phase: implement, run test, potentially iterate on failures
- REFACTOR phase: modify code, run tests, verify no regressions
Each phase involves reading files, running commands, analyzing output. The back-and-forth is inherently heavier than linear task execution.
Single feature focus ensures full quality throughout the cycle.
</context_budget>

View File

@@ -0,0 +1,165 @@
<ui_patterns>
Visual patterns for user-facing GSD output. Orchestrators @-reference this file.
<core>
<!-- Loaded by ALL commands via ui-brand-core.md -->
## Stage Banners
Use for major workflow transitions.
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GSD ► {STAGE NAME}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
**Stage names (uppercase):**
- `QUESTIONING`
- `RESEARCHING`
- `DEFINING REQUIREMENTS`
- `CREATING ROADMAP`
- `PLANNING PHASE {N}`
- `EXECUTING WAVE {N}`
- `VERIFYING`
- `PHASE {N} COMPLETE ✓`
- `MILESTONE COMPLETE 🎉`
---
## Status Symbols
```
✓ Complete / Passed / Verified
✗ Failed / Missing / Blocked
◆ In Progress
○ Pending
⚡ Auto-approved
⚠ Warning
🎉 Milestone complete (only in banner)
```
---
## Progress Display
**Phase/milestone level:**
```
Progress: ████████░░ 80%
```
**Task level:**
```
Tasks: 2/4 complete
```
**Plan level:**
```
Plans: 3/5 complete
```
---
## Next Up Block
Always at end of major completions.
```
───────────────────────────────────────────────────────────────
## ▶ Next Up
**{Identifier}: {Name}** - {one-line description}
`{copy-paste command}`
<sub>`/new` first → fresh context window</sub>
───────────────────────────────────────────────────────────────
**Also available:**
- `/gsd-alternative-1` - description
- `/gsd-alternative-2` - description
───────────────────────────────────────────────────────────────
```
---
## Tables
```
| Phase | Status | Plans | Progress |
| ----- | ------ | ----- | -------- |
| 1 | ✓ | 3/3 | 100% |
| 2 | ◆ | 1/4 | 25% |
| 3 | ○ | 0/2 | 0% |
```
---
## Anti-Patterns
- Varying box/banner widths
- Mixing banner styles (`===`, `---`, `***`)
- Skipping `GSD ►` prefix in banners
- Random emoji (`🚀`, `✨`, `💫`)
- Missing Next Up block after completions
</core>
<!-- Execution-only: loaded by gsd-execute-phase, gsd-ui-phase, gsd-ui-review -->
## Checkpoint Boxes
User action required. 62-character width.
```
╔══════════════════════════════════════════════════════════════╗
║ CHECKPOINT: {Type} ║
╚══════════════════════════════════════════════════════════════╝
{Content}
──────────────────────────────────────────────────────────────
→ {ACTION PROMPT}
──────────────────────────────────────────────────────────────
```
**Types:**
- `CHECKPOINT: Verification Required``→ Type "approved" or describe issues`
- `CHECKPOINT: Decision Required``→ Select: option-a / option-b`
- `CHECKPOINT: Action Required``→ Type "done" when complete`
---
## Spawning Indicators
```
◆ Spawning researcher...
◆ Spawning 4 researchers in parallel...
→ Stack research
→ Features research
→ Architecture research
→ Pitfalls research
✓ Researcher complete: STACK.md written
```
---
## Error Box
```
╔══════════════════════════════════════════════════════════════╗
║ ERROR ║
╚══════════════════════════════════════════════════════════════╝
{Error description}
**To fix:** {Resolution steps}
```
</ui_patterns>

View File

@@ -0,0 +1,681 @@
# User Profiling: Detection Heuristics Reference
This reference document defines detection heuristics for behavioral profiling across 8 dimensions. The gsd-user-profiler agent applies these rules when analyzing extracted session messages. Do not invent dimensions or scoring rules beyond what is defined here.
## How to Use This Document
1. The gsd-user-profiler agent reads this document before analyzing any messages
2. For each dimension, the agent scans messages for the signal patterns defined below
3. The agent applies the detection heuristics to classify the developer's pattern
4. Confidence is scored using the thresholds defined per dimension
5. Evidence quotes are curated using the rules in the Evidence Curation section
6. Output must conform to the JSON schema in the Output Schema section
---
## Dimensions
### 1. Communication Style
`dimension_id: communication_style`
**What we're measuring:** How the developer phrases requests, instructions, and feedback -- the structural pattern of their messages to the agent.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `terse-direct` | Short, imperative messages with minimal context. Gets to the point immediately. |
| `conversational` | Medium-length messages mixing instructions with questions and thinking-aloud. Natural, informal tone. |
| `detailed-structured` | Long messages with explicit structure -- headers, numbered lists, problem statements, pre-analysis. |
| `mixed` | No dominant pattern; style shifts based on task type or project context. |
**Signal patterns:**
1. **Message length distribution** -- Average word count across messages. Terse < 50 words, conversational 50-200 words, detailed > 200 words.
2. **Imperative-to-interrogative ratio** -- Ratio of commands ("fix this", "add X") to questions ("what do you think?", "should we?"). High imperative ratio suggests terse-direct.
3. **Structural formatting** -- Presence of markdown headers, numbered lists, code blocks, or bullet points within messages. Frequent formatting suggests detailed-structured.
4. **Context preambles** -- Whether the developer provides background/context before making a request. Preambles suggest conversational or detailed-structured.
5. **Sentence completeness** -- Whether messages use full sentences or fragments/shorthand. Fragments suggest terse-direct.
6. **Follow-up pattern** -- Whether the developer provides additional context in subsequent messages (multi-message requests suggest conversational).
**Detection heuristics:**
1. If average message length < 50 words AND predominantly imperative mood AND minimal formatting --> `terse-direct`
2. If average message length 50-200 words AND mix of imperative and interrogative AND occasional formatting --> `conversational`
3. If average message length > 200 words AND frequent structural formatting AND context preambles present --> `detailed-structured`
4. If message length variance is high (std dev > 60% of mean) AND no single pattern dominates (< 60% of messages match one style) --> `mixed`
5. If pattern varies systematically by project type (e.g., terse in CLI projects, detailed in frontend) --> `mixed` with context-dependent note
**Confidence scoring:**
- **HIGH:** 10+ messages showing consistent pattern (> 70% match), same pattern observed across 2+ projects
- **MEDIUM:** 5-9 messages showing pattern, OR pattern consistent within 1 project only
- **LOW:** < 5 messages with relevant signals, OR mixed signals (contradictory patterns observed in similar contexts)
- **UNSCORED:** 0 messages with relevant signals for this dimension
**Example quotes:**
- **terse-direct:** "fix the auth bug" / "add pagination to the list endpoint" / "this test is failing, make it pass"
- **conversational:** "I'm thinking we should probably handle the error case here. What do you think about returning a 422 instead of a 500? The client needs to know it was a validation issue."
- **detailed-structured:** "## Context\nThe auth flow currently uses session cookies but we need to migrate to JWT.\n\n## Requirements\n1. Access tokens (15min expiry)\n2. Refresh tokens (7-day)\n3. httpOnly cookies\n\n## What I've tried\nI looked at jose and jsonwebtoken..."
**Context-dependent patterns:**
When communication style varies systematically by project or task type, report the split rather than forcing a single rating. Example: "context-dependent: terse-direct for bug fixes and CLI tooling, detailed-structured for architecture and frontend work." Phase 3 orchestration resolves context-dependent splits by presenting the split to the user.
---
### 2. Decision Speed
`dimension_id: decision_speed`
**What we're measuring:** How quickly the developer makes choices when the agent presents options, alternatives, or trade-offs.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `fast-intuitive` | Decides immediately based on experience or gut feeling. Minimal deliberation. |
| `deliberate-informed` | Requests comparison or summary before deciding. Wants to understand trade-offs. |
| `research-first` | Delays decision to research independently. May leave and return with findings. |
| `delegator` | Defers to the agent's recommendation. Trusts the suggestion. |
**Signal patterns:**
1. **Response latency to options** -- How many messages between the agent presenting options and developer choosing. Immediate (same message or next) suggests fast-intuitive.
2. **Comparison requests** -- Presence of "compare these", "what are the trade-offs?", "pros and cons?" suggests deliberate-informed.
3. **External research indicators** -- Messages like "I looked into X and...", "according to the docs...", "I read that..." suggest research-first.
4. **Delegation language** -- "just pick one", "whatever you recommend", "your call", "go with the best option" suggests delegator.
5. **Decision reversal frequency** -- How often the developer changes a decision after making it. Frequent reversals may indicate fast-intuitive with low confidence.
**Detection heuristics:**
1. If developer selects options within 1-2 messages of presentation AND uses decisive language ("use X", "go with A") AND rarely asks for comparisons --> `fast-intuitive`
2. If developer requests trade-off analysis or comparison tables AND decides after receiving comparison AND asks clarifying questions --> `deliberate-informed`
3. If developer defers decisions with "let me look into this" AND returns with external information AND cites documentation or articles --> `research-first`
4. If developer uses delegation language (> 3 instances) AND rarely overrides the agent's choices AND says "sounds good" or "your call" --> `delegator`
5. If no clear pattern OR evidence is split across multiple styles --> classify as the dominant style with a context-dependent note
**Confidence scoring:**
- **HIGH:** 10+ decision points observed showing consistent pattern, same pattern across 2+ projects
- **MEDIUM:** 5-9 decision points, OR consistent within 1 project only
- **LOW:** < 5 decision points observed, OR mixed decision-making styles
- **UNSCORED:** 0 messages containing decision-relevant signals
**Example quotes:**
- **fast-intuitive:** "Use Tailwind. Next question." / "Option B, let's move on"
- **deliberate-informed:** "Can you compare Prisma vs Drizzle for this use case? I want to understand the migration story and type safety differences before I pick."
- **research-first:** "Hold off on the DB choice -- I want to read the Drizzle docs and check their GitHub issues first. I'll come back with a decision."
- **delegator:** "You know more about this than me. Whatever you recommend, go with it."
**Context-dependent patterns:**
Decision speed often varies by stakes. A developer may be fast-intuitive for styling choices but research-first for database or auth decisions. When this pattern is clear, report the split: "context-dependent: fast-intuitive for low-stakes (styling, naming), deliberate-informed for high-stakes (architecture, security)."
---
### 3. Explanation Depth
`dimension_id: explanation_depth`
**What we're measuring:** How much explanation the developer wants alongside code -- their preference for understanding vs. speed.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `code-only` | Wants working code with minimal or no explanation. Reads and understands code directly. |
| `concise` | Wants brief explanation of approach with code. Key decisions noted, not exhaustive. |
| `detailed` | Wants thorough walkthrough of the approach, reasoning, and code. Appreciates structure. |
| `educational` | Wants deep conceptual explanation. Treats interactions as learning opportunities. |
**Signal patterns:**
1. **Explicit depth requests** -- "just show me the code", "explain why", "teach me about X", "skip the explanation"
2. **Reaction to explanations** -- Does the developer skip past explanations? Ask for more detail? Say "too much"?
3. **Follow-up question depth** -- Surface-level follow-ups ("does it work?") vs. conceptual ("why this pattern over X?")
4. **Code comprehension signals** -- Does the developer reference implementation details in their messages? This suggests they read and understand code directly.
5. **"I know this" signals** -- Messages like "I'm familiar with X", "skip the basics", "I know how hooks work" indicate lower explanation preference.
**Detection heuristics:**
1. If developer says "just the code" or "skip the explanation" AND rarely asks follow-up conceptual questions AND references code details directly --> `code-only`
2. If developer accepts brief explanations without asking for more AND asks focused follow-ups about specific decisions --> `concise`
3. If developer asks "why" questions AND requests walkthroughs AND appreciates structured explanations --> `detailed`
4. If developer asks conceptual questions beyond the immediate task AND uses learning language ("I want to understand", "teach me") --> `educational`
**Confidence scoring:**
- **HIGH:** 10+ messages showing consistent preference, same preference across 2+ projects
- **MEDIUM:** 5-9 messages, OR consistent within 1 project only
- **LOW:** < 5 relevant messages, OR preferences shift between interactions
- **UNSCORED:** 0 messages with relevant signals
**Example quotes:**
- **code-only:** "Just give me the implementation. I'll read through it." / "Skip the explanation, show the code."
- **concise:** "Quick summary of the approach, then the code please." / "Why did you use a Map here instead of an object?"
- **detailed:** "Walk me through this step by step. I want to understand the auth flow before we implement it."
- **educational:** "Can you explain how JWT refresh token rotation works conceptually? I want to understand the security model, not just implement it."
**Context-dependent patterns:**
Explanation depth often correlates with domain familiarity. A developer may want code-only for well-known tech but educational for new domains. Report splits when observed: "context-dependent: code-only for React/TypeScript, detailed for database optimization."
---
### 4. Debugging Approach
`dimension_id: debugging_approach`
**What we're measuring:** How the developer approaches problems, errors, and unexpected behavior when working with the agent.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `fix-first` | Pastes error, wants it fixed. Minimal diagnosis interest. Results-oriented. |
| `diagnostic` | Shares error with context, wants to understand the cause before fixing. |
| `hypothesis-driven` | Investigates independently first, brings specific theories to the agent for validation. |
| `collaborative` | Wants to work through the problem step-by-step with the agent as a partner. |
**Signal patterns:**
1. **Error presentation style** -- Raw error paste only (fix-first) vs. error + "I think it might be..." (hypothesis-driven) vs. "Can you help me understand why..." (diagnostic)
2. **Pre-investigation indicators** -- Does the developer share what they already tried? Do they mention reading logs, checking state, or isolating the issue?
3. **Root cause interest** -- After a fix, does the developer ask "why did that happen?" or just move on?
4. **Step-by-step language** -- "Let's check X first", "what should we look at next?", "walk me through the debugging"
5. **Fix acceptance pattern** -- Does the developer immediately apply fixes or question them first?
**Detection heuristics:**
1. If developer pastes errors without context AND accepts fixes without root cause questions AND moves on immediately --> `fix-first`
2. If developer provides error context AND asks "why is this happening?" AND wants explanation with the fix --> `diagnostic`
3. If developer shares their own analysis AND proposes theories ("I think the issue is X because...") AND asks the agent to confirm or refute --> `hypothesis-driven`
4. If developer uses collaborative language ("let's", "what should we check?") AND prefers incremental diagnosis AND walks through problems together --> `collaborative`
**Confidence scoring:**
- **HIGH:** 10+ debugging interactions showing consistent approach, same approach across 2+ projects
- **MEDIUM:** 5-9 debugging interactions, OR consistent within 1 project only
- **LOW:** < 5 debugging interactions, OR approach varies significantly
- **UNSCORED:** 0 messages with debugging-relevant signals
**Example quotes:**
- **fix-first:** "Getting this error: TypeError: Cannot read properties of undefined. Fix it."
- **diagnostic:** "The API returns 500 when I send a POST to /users. Here's the request body and the server log. What's causing this?"
- **hypothesis-driven:** "I think the race condition is in the useEffect cleanup. I checked and the subscription isn't being cancelled on unmount. Can you confirm?"
- **collaborative:** "Let's debug this together. The test passes locally but fails in CI. What should we check first?"
**Context-dependent patterns:**
Debugging approach may vary by urgency. A developer might be fix-first under deadline pressure but hypothesis-driven during regular development. Note temporal patterns if detected.
---
### 5. UX Philosophy
`dimension_id: ux_philosophy`
**What we're measuring:** How the developer prioritizes user experience, design, and visual quality relative to functionality.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `function-first` | Get it working, polish later. Minimal UX concern during implementation. |
| `pragmatic` | Basic usability from the start. Nothing ugly or broken, but no design obsession. |
| `design-conscious` | Design and UX are treated as important as functionality. Attention to visual detail. |
| `backend-focused` | Primarily builds backend/CLI. Minimal frontend exposure or interest. |
**Signal patterns:**
1. **Design-related requests** -- Mentions of styling, layout, responsiveness, animations, color schemes, spacing
2. **Polish timing** -- Does the developer ask for visual polish during implementation or defer it?
3. **UI feedback specificity** -- Vague ("make it look better") vs. specific ("increase the padding to 16px, change the font weight to 600")
4. **Frontend vs. backend distribution** -- Ratio of frontend-focused requests to backend-focused requests
5. **Accessibility mentions** -- References to a11y, screen readers, keyboard navigation, ARIA labels
**Detection heuristics:**
1. If developer rarely mentions UI/UX AND focuses on logic, APIs, data AND defers styling ("we'll make it pretty later") --> `function-first`
2. If developer includes basic UX requirements AND mentions usability but not pixel-perfection AND balances form with function --> `pragmatic`
3. If developer provides specific design requirements AND mentions polish, animations, spacing AND treats UI bugs as seriously as logic bugs --> `design-conscious`
4. If developer works primarily on CLI tools, APIs, or backend systems AND rarely or never works on frontend AND messages focus on data, performance, infrastructure --> `backend-focused`
**Confidence scoring:**
- **HIGH:** 10+ messages with UX-relevant signals, same pattern across 2+ projects
- **MEDIUM:** 5-9 messages, OR consistent within 1 project only
- **LOW:** < 5 relevant messages, OR philosophy varies by project type
- **UNSCORED:** 0 messages with UX-relevant signals
**Example quotes:**
- **function-first:** "Just get the form working. We'll style it later." / "I don't care how it looks, I need the data flowing."
- **pragmatic:** "Make sure the loading state is visible and the error messages are clear. Standard styling is fine."
- **design-conscious:** "The button needs more breathing room -- add 12px vertical padding and make the hover state transition 200ms. Also check the contrast ratio."
- **backend-focused:** "I'm building a CLI tool. No UI needed." / "Add the REST endpoint, I'll handle the frontend separately."
**Context-dependent patterns:**
UX philosophy is inherently project-dependent. A developer building a CLI tool is necessarily backend-focused for that project. When possible, distinguish between project-driven and preference-driven patterns. If the developer only has backend projects, note that the rating reflects available data: "backend-focused (note: all analyzed projects are backend/CLI -- may not reflect frontend preferences)."
---
### 6. Vendor Philosophy
`dimension_id: vendor_philosophy`
**What we're measuring:** How the developer approaches choosing and evaluating libraries, frameworks, and external services.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `pragmatic-fast` | Uses what works, what the agent suggests, or what's fastest. Minimal evaluation. |
| `conservative` | Prefers well-known, battle-tested, widely-adopted options. Risk-averse. |
| `thorough-evaluator` | Researches alternatives, reads docs, compares features and trade-offs before committing. |
| `opinionated` | Has strong, pre-existing preferences for specific tools. Knows what they like. |
**Signal patterns:**
1. **Library selection language** -- "just use whatever", "is X the standard?", "I want to compare A vs B", "we're using X, period"
2. **Evaluation depth** -- Does the developer accept the first suggestion or ask for alternatives?
3. **Stated preferences** -- Explicit mentions of preferred tools, past experience, or tool philosophy
4. **Rejection patterns** -- Does the developer reject the agent's suggestions? On what basis (popularity, personal experience, docs quality)?
5. **Dependency attitude** -- "minimize dependencies", "no external deps", "add whatever we need" -- reveals philosophy about external code
**Detection heuristics:**
1. If developer accepts library suggestions without pushback AND uses phrases like "sounds good" or "go with that" AND rarely asks about alternatives --> `pragmatic-fast`
2. If developer asks about popularity, maintenance, community AND prefers "industry standard" or "battle-tested" AND avoids new/experimental --> `conservative`
3. If developer requests comparisons AND reads docs before deciding AND asks about edge cases, license, bundle size --> `thorough-evaluator`
4. If developer names specific libraries unprompted AND overrides the agent's suggestions AND expresses strong preferences --> `opinionated`
**Confidence scoring:**
- **HIGH:** 10+ vendor/library decisions observed, same pattern across 2+ projects
- **MEDIUM:** 5-9 decisions, OR consistent within 1 project only
- **LOW:** < 5 vendor decisions observed, OR pattern varies
- **UNSCORED:** 0 messages with vendor-selection signals
**Example quotes:**
- **pragmatic-fast:** "Use whatever ORM you recommend. I just need it working." / "Sure, Tailwind is fine."
- **conservative:** "Is Prisma the most widely used ORM for this? I want something with a large community." / "Let's stick with what most teams use."
- **thorough-evaluator:** "Before we pick a state management library, can you compare Zustand vs Jotai vs Redux Toolkit? I want to understand bundle size, API surface, and TypeScript support."
- **opinionated:** "We're using Drizzle, not Prisma. I've used both and Drizzle's SQL-like API is better for complex queries."
**Context-dependent patterns:**
Vendor philosophy may shift based on project importance or domain. Personal projects may use pragmatic-fast while professional projects use thorough-evaluator. Report the split if detected.
---
### 7. Frustration Triggers
`dimension_id: frustration_triggers`
**What we're measuring:** What causes visible frustration, correction, or negative emotional signals in the developer's messages to the agent.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `scope-creep` | Frustrated when the agent does things that were not asked for. Wants bounded execution. |
| `instruction-adherence` | Frustrated when the agent doesn't follow instructions precisely. Values exactness. |
| `verbosity` | Frustrated when the agent over-explains or is too wordy. Wants conciseness. |
| `regression` | Frustrated when the agent breaks working code while fixing something else. Values stability. |
**Signal patterns:**
1. **Correction language** -- "I didn't ask for that", "don't do X", "I said Y not Z", "why did you change this?"
2. **Repetition patterns** -- Repeating the same instruction with emphasis suggests instruction-adherence frustration
3. **Emotional tone shifts** -- Shift from neutral to terse, use of capitals, exclamation marks, explicit frustration words
4. **"Don't" statements** -- "don't add extra features", "don't explain so much", "don't touch that file" -- what they prohibit reveals what frustrates them
5. **Frustration recovery** -- How quickly the developer returns to neutral tone after a frustration event
**Detection heuristics:**
1. If developer corrects the agent for doing unrequested work AND uses language like "I only asked for X", "stop adding things", "stick to what I asked" --> `scope-creep`
2. If developer repeats instructions AND corrects specific deviations from stated requirements AND emphasizes precision ("I specifically said...") --> `instruction-adherence`
3. If developer asks the agent to be shorter AND skips explanations AND expresses annoyance at length ("too much", "just the answer") --> `verbosity`
4. If developer expresses frustration at broken functionality AND checks for regressions AND says "you broke X while fixing Y" --> `regression`
**Confidence scoring:**
- **HIGH:** 10+ frustration events showing consistent trigger pattern, same trigger across 2+ projects
- **MEDIUM:** 5-9 frustration events, OR consistent within 1 project only
- **LOW:** < 5 frustration events observed (note: low frustration count is POSITIVE -- it means the developer is generally satisfied, not that data is insufficient)
- **UNSCORED:** 0 messages with frustration signals (note: "no frustration detected" is a valid finding)
**Example quotes:**
- **scope-creep:** "I asked you to fix the login bug, not refactor the entire auth module. Revert everything except the bug fix."
- **instruction-adherence:** "I said to use a Map, not an object. I was specific about this. Please redo it with a Map."
- **verbosity:** "Way too much explanation. Just show me the code change, nothing else."
- **regression:** "The search was working fine before. Now after your 'fix' to the filter, search results are empty. Don't touch things I didn't ask you to change."
**Context-dependent patterns:**
Frustration triggers tend to be consistent across projects (personality-driven, not project-driven). However, their intensity may vary with project stakes. If multiple frustration triggers are observed, report the primary (most frequent) and note secondaries.
---
### 8. Learning Style
`dimension_id: learning_style`
**What we're measuring:** How the developer prefers to understand new concepts, tools, or patterns they encounter.
**Rating spectrum:**
| Rating | Description |
|--------|-------------|
| `self-directed` | Reads code directly, figures things out independently. Asks the agent specific questions. |
| `guided` | Asks the agent to explain relevant parts. Prefers guided understanding. |
| `documentation-first` | Reads official docs and tutorials before diving in. References documentation. |
| `example-driven` | Wants working examples to modify and learn from. Pattern-matching learner. |
**Signal patterns:**
1. **Learning initiation** -- Does the developer start by reading code, asking for explanation, requesting docs, or asking for examples?
2. **Reference to external sources** -- Mentions of documentation, tutorials, Stack Overflow, blog posts suggest documentation-first
3. **Example requests** -- "show me an example", "can you give me a sample?", "let me see how this looks in practice"
4. **Code-reading indicators** -- "I looked at the implementation", "I see that X calls Y", "from reading the code..."
5. **Explanation requests vs. code requests** -- Ratio of "explain X" to "show me X" messages
**Detection heuristics:**
1. If developer references reading code directly AND asks specific targeted questions AND demonstrates independent investigation --> `self-directed`
2. If developer asks the agent to explain concepts AND requests walkthroughs AND prefers Claude-mediated understanding --> `guided`
3. If developer cites documentation AND asks for doc links AND mentions reading tutorials or official guides --> `documentation-first`
4. If developer requests examples AND modifies provided examples AND learns by pattern matching --> `example-driven`
**Confidence scoring:**
- **HIGH:** 10+ learning interactions showing consistent preference, same preference across 2+ projects
- **MEDIUM:** 5-9 learning interactions, OR consistent within 1 project only
- **LOW:** < 5 learning interactions, OR preference varies by topic familiarity
- **UNSCORED:** 0 messages with learning-relevant signals
**Example quotes:**
- **self-directed:** "I read through the middleware code. The issue is that the token check happens after the rate limiter. Should those be swapped?"
- **guided:** "Can you walk me through how the auth flow works in this codebase? Start from the login request."
- **documentation-first:** "I read the Prisma docs on relations. Can you help me apply the many-to-many pattern from their guide to our schema?"
- **example-driven:** "Show me a working example of a protected API route with JWT validation. I'll adapt it for our endpoints."
**Context-dependent patterns:**
Learning style often varies with domain expertise. A developer may be self-directed in familiar domains but guided or example-driven in new ones. Report the split if detected: "context-dependent: self-directed for TypeScript/Node, example-driven for Rust/systems programming."
---
## Evidence Curation
### Evidence Format
Use the combined format for each evidence entry:
**Signal:** [pattern interpretation -- what the quote demonstrates] / **Example:** "[trimmed quote, ~100 characters]" -- project: [project name]
### Evidence Targets
- **3 evidence quotes per dimension** (24 total across all 8 dimensions)
- Select quotes that best illustrate the rated pattern
- Prefer quotes from different projects to demonstrate cross-project consistency
- When fewer than 3 relevant quotes exist, include what is available and note the evidence count
### Quote Truncation
- Trim quotes to the behavioral signal -- the part that demonstrates the pattern
- Target approximately 100 characters per quote
- Preserve the meaningful fragment, not the full message
- If the signal is in the middle of a long message, use "..." to indicate trimming
- Never include the full 500-character message when 50 characters capture the signal
### Project Attribution
- Every evidence quote must include the project name
- Project attribution enables verification and shows cross-project patterns
- Format: `-- project: [name]`
### Sensitive Content Exclusion (Layer 1)
The profiler agent must never select quotes containing any of the following patterns:
- `sk-` (API key prefixes)
- `Bearer ` (auth tokens)
- `password` (credentials)
- `secret` (secrets)
- `token` (when used as a credential value, not a concept discussion)
- `api_key` or `API_KEY` (API key references)
- Full absolute file paths containing usernames (e.g., `/Users/john/...`, `/home/john/...`)
**When sensitive content is found and excluded**, report as metadata in the analysis output:
```json
{
"sensitive_excluded": [
{ "type": "api_key_pattern", "count": 2 },
{ "type": "file_path_with_username", "count": 1 }
]
}
```
This metadata enables defense-in-depth auditing. Layer 2 (regex filter in the write-profile step) provides a second pass, but the profiler should still avoid selecting sensitive quotes.
### Natural Language Priority
Weight natural language messages higher than:
- Pasted log output (detected by timestamps, repeated format strings, `[DEBUG]`, `[INFO]`, `[ERROR]`)
- Session context dumps (messages starting with "This session is being continued from a previous conversation")
- Large code pastes (messages where > 80% of content is inside code fences)
These message types are genuine but carry less behavioral signal. Deprioritize them when selecting evidence quotes.
---
## Recency Weighting
### Guideline
Recent sessions (last 30 days) should be weighted approximately 3x compared to older sessions when analyzing patterns.
### Rationale
Developer styles evolve. A developer who was terse six months ago may now provide detailed structured context. Recent behavior is a more accurate reflection of current working style.
### Application
1. When counting signals for confidence scoring, recent signals count 3x (e.g., 4 recent signals = 12 weighted signals)
2. When selecting evidence quotes, prefer recent quotes over older ones when both demonstrate the same pattern
3. When patterns conflict between recent and older sessions, the recent pattern takes precedence for the rating, but note the evolution: "recently shifted from terse-direct to conversational"
4. The 30-day window is relative to the analysis date, not a fixed date
### Edge Cases
- If ALL sessions are older than 30 days, apply no weighting (all sessions are equally stale)
- If ALL sessions are within the last 30 days, apply no weighting (all sessions are equally recent)
- The 3x weight is a guideline, not a hard multiplier -- use judgment when the weighted count changes a confidence threshold
---
## Thin Data Handling
### Message Thresholds
| Total Genuine Messages | Mode | Behavior |
|------------------------|------|----------|
| > 50 | `full` | Full analysis across all 8 dimensions. Questionnaire optional (user can choose to supplement). |
| 20-50 | `hybrid` | Analyze available messages. Score each dimension with confidence. Supplement with questionnaire for LOW/UNSCORED dimensions. |
| < 20 | `insufficient` | All dimensions scored LOW or UNSCORED. Recommend questionnaire fallback as primary profile source. Note: "insufficient session data for behavioral analysis." |
### Handling Insufficient Dimensions
When a specific dimension has insufficient data (even if total messages exceed thresholds):
- Set confidence to `UNSCORED`
- Set summary to: "Insufficient data -- no clear signals detected for this dimension."
- Set claude_instruction to a neutral fallback: "No strong preference detected. Ask the developer when this dimension is relevant."
- Set evidence_quotes to empty array `[]`
- Set evidence_count to `0`
### Questionnaire Supplement
When operating in `hybrid` mode, the questionnaire fills gaps for dimensions where session analysis produced LOW or UNSCORED confidence. The questionnaire-derived ratings use:
- **MEDIUM** confidence for strong, definitive picks
- **LOW** confidence for "it varies" or ambiguous selections
If session analysis and questionnaire agree on a dimension, confidence can be elevated (e.g., session LOW + questionnaire MEDIUM agreement = MEDIUM).
---
## Output Schema
The profiler agent must return JSON matching this exact schema, wrapped in `<analysis>` tags.
```json
{
"profile_version": "1.0",
"analyzed_at": "ISO-8601 timestamp",
"data_source": "session_analysis",
"projects_analyzed": ["project-name-1", "project-name-2"],
"messages_analyzed": 0,
"message_threshold": "full|hybrid|insufficient",
"sensitive_excluded": [
{ "type": "string", "count": 0 }
],
"dimensions": {
"communication_style": {
"rating": "terse-direct|conversational|detailed-structured|mixed",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [
{
"signal": "Pattern interpretation describing what the quote demonstrates",
"quote": "Trimmed quote, approximately 100 characters",
"project": "project-name"
}
],
"summary": "One to two sentence description of the observed pattern",
"claude_instruction": "Imperative directive for the agent: 'Match structured communication style' not 'You tend to provide structured context'"
},
"decision_speed": {
"rating": "fast-intuitive|deliberate-informed|research-first|delegator",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"explanation_depth": {
"rating": "code-only|concise|detailed|educational",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"debugging_approach": {
"rating": "fix-first|diagnostic|hypothesis-driven|collaborative",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"ux_philosophy": {
"rating": "function-first|pragmatic|design-conscious|backend-focused",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"vendor_philosophy": {
"rating": "pragmatic-fast|conservative|thorough-evaluator|opinionated",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"frustration_triggers": {
"rating": "scope-creep|instruction-adherence|verbosity|regression",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
},
"learning_style": {
"rating": "self-directed|guided|documentation-first|example-driven",
"confidence": "HIGH|MEDIUM|LOW|UNSCORED",
"evidence_count": 0,
"cross_project_consistent": true,
"evidence_quotes": [],
"summary": "string",
"claude_instruction": "string"
}
}
}
```
### Schema Notes
- **`profile_version`**: Always `"1.0"` for this schema version
- **`analyzed_at`**: ISO-8601 timestamp of when the analysis was performed
- **`data_source`**: `"session_analysis"` for session-based profiling, `"questionnaire"` for questionnaire-only, `"hybrid"` for combined
- **`projects_analyzed`**: List of project names that contributed messages
- **`messages_analyzed`**: Total number of genuine user messages processed
- **`message_threshold`**: Which threshold mode was triggered (`full`, `hybrid`, `insufficient`)
- **`sensitive_excluded`**: Array of excluded sensitive content types with counts (empty array if none found)
- **`claude_instruction`**: Must be written in imperative form directed at the agent. This field is how the profile becomes actionable.
- Good: "Provide structured responses with headers and numbered lists to match this developer's communication style."
- Bad: "You tend to like structured responses."
- Good: "Ask before making changes beyond the stated request -- this developer values bounded execution."
- Bad: "The developer gets frustrated when you do extra work."
---
## Cross-Project Consistency
### Assessment
For each dimension, assess whether the observed pattern is consistent across the projects analyzed:
- **`cross_project_consistent: true`** -- Same rating would apply regardless of which project is analyzed. Evidence from 2+ projects shows the same pattern.
- **`cross_project_consistent: false`** -- Pattern varies by project. Include a context-dependent note in the summary.
### Reporting Splits
When `cross_project_consistent` is false, the summary must describe the split:
- "Context-dependent: terse-direct for CLI/backend projects (gsd-tools, api-server), detailed-structured for frontend projects (dashboard, landing-page)."
- "Context-dependent: fast-intuitive for familiar tech (React, Node), research-first for new domains (Rust, ML)."
The rating field should reflect the **dominant** pattern (most evidence). The summary describes the nuance.
### Phase 3 Resolution
Context-dependent splits are resolved during Phase 3 orchestration. The orchestrator presents the split to the developer and asks which pattern represents their general preference. Until resolved, the agent uses the dominant pattern with awareness of the context-dependent variation.
---
*Reference document version: 1.0*
*Dimensions: 8*
*Schema: profile_version 1.0*

View File

@@ -0,0 +1,612 @@
# Verification Patterns
How to verify different types of artifacts are real implementations, not stubs or placeholders.
<core_principle>
**Existence ≠ Implementation**
A file existing does not mean the feature works. Verification must check:
1. **Exists** - File is present at expected path
2. **Substantive** - Content is real implementation, not placeholder
3. **Wired** - Connected to the rest of the system
4. **Functional** - Actually works when invoked
Levels 1-3 can be checked programmatically. Level 4 often requires human verification.
</core_principle>
<stub_detection>
## Universal Stub Patterns
These patterns indicate placeholder code regardless of file type:
**Comment-based stubs:**
```bash
# Grep patterns for stub comments
grep -E "(TODO|FIXME|XXX|HACK|PLACEHOLDER)" "$file"
grep -E "implement|add later|coming soon|will be" "$file" -i
grep -E "// \.\.\.|/\* \.\.\. \*/|# \.\.\." "$file"
```
**Placeholder text in output:**
```bash
# UI placeholder patterns
grep -E "placeholder|lorem ipsum|coming soon|under construction" "$file" -i
grep -E "sample|example|test data|dummy" "$file" -i
grep -E "\[.*\]|<.*>|\{.*\}" "$file" # Template brackets left in
```
**Empty or trivial implementations:**
```bash
# Functions that do nothing
grep -E "return null|return undefined|return \{\}|return \[\]" "$file"
grep -E "pass$|\.\.\.|\bnothing\b" "$file"
grep -E "console\.(log|warn|error).*only" "$file" # Log-only functions
```
**Hardcoded values where dynamic expected:**
```bash
# Hardcoded IDs, counts, or content
grep -E "id.*=.*['\"].*['\"]" "$file" # Hardcoded string IDs
grep -E "count.*=.*\d+|length.*=.*\d+" "$file" # Hardcoded counts
grep -E "\\\$\d+\.\d{2}|\d+ items" "$file" # Hardcoded display values
```
</stub_detection>
<react_components>
## React/Next.js Components
**Existence check:**
```bash
# File exists and exports component
[ -f "$component_path" ] && grep -E "export (default |)function|export const.*=.*\(" "$component_path"
```
**Substantive check:**
```bash
# Returns actual JSX, not placeholder
grep -E "return.*<" "$component_path" | grep -v "return.*null" | grep -v "placeholder" -i
# Has meaningful content (not just wrapper div)
grep -E "<[A-Z][a-zA-Z]+|className=|onClick=|onChange=" "$component_path"
# Uses props or state (not static)
grep -E "props\.|useState|useEffect|useContext|\{.*\}" "$component_path"
```
**Stub patterns specific to React:**
```javascript
// RED FLAGS - These are stubs:
return <div>Component</div>
return <div>Placeholder</div>
return <div>{/* TODO */}</div>
return <p>Coming soon</p>
return null
return <></>
// Also stubs - empty handlers:
onClick={() => {}}
onChange={() => console.log('clicked')}
onSubmit={(e) => e.preventDefault()} // Only prevents default, does nothing
```
**Wiring check:**
```bash
# Component imports what it needs
grep -E "^import.*from" "$component_path"
# Props are actually used (not just received)
# Look for destructuring or props.X usage
grep -E "\{ .* \}.*props|\bprops\.[a-zA-Z]+" "$component_path"
# API calls exist (for data-fetching components)
grep -E "fetch\(|axios\.|useSWR|useQuery|getServerSideProps|getStaticProps" "$component_path"
```
**Functional verification (human required):**
- Does the component render visible content?
- Do interactive elements respond to clicks?
- Does data load and display?
- Do error states show appropriately?
</react_components>
<api_routes>
## API Routes (Next.js App Router / Express / etc.)
**Existence check:**
```bash
# Route file exists
[ -f "$route_path" ]
# Exports HTTP method handlers (Next.js App Router)
grep -E "export (async )?(function|const) (GET|POST|PUT|PATCH|DELETE)" "$route_path"
# Or Express-style handlers
grep -E "\.(get|post|put|patch|delete)\(" "$route_path"
```
**Substantive check:**
```bash
# Has actual logic, not just return statement
wc -l "$route_path" # More than 10-15 lines suggests real implementation
# Interacts with data source
grep -E "prisma\.|db\.|mongoose\.|sql|query|find|create|update|delete" "$route_path" -i
# Has error handling
grep -E "try|catch|throw|error|Error" "$route_path"
# Returns meaningful response
grep -E "Response\.json|res\.json|res\.send|return.*\{" "$route_path" | grep -v "message.*not implemented" -i
```
**Stub patterns specific to API routes:**
```typescript
// RED FLAGS - These are stubs:
export async function POST() {
return Response.json({ message: "Not implemented" })
}
export async function GET() {
return Response.json([]) // Empty array with no DB query
}
export async function PUT() {
return new Response() // Empty response
}
// Console log only:
export async function POST(req) {
console.log(await req.json())
return Response.json({ ok: true })
}
```
**Wiring check:**
```bash
# Imports database/service clients
grep -E "^import.*prisma|^import.*db|^import.*client" "$route_path"
# Actually uses request body (for POST/PUT)
grep -E "req\.json\(\)|req\.body|request\.json\(\)" "$route_path"
# Validates input (not just trusting request)
grep -E "schema\.parse|validate|zod|yup|joi" "$route_path"
```
**Functional verification (human or automated):**
- Does GET return real data from database?
- Does POST actually create a record?
- Does error response have correct status code?
- Are auth checks actually enforced?
</api_routes>
<database_schema>
## Database Schema (Prisma / Drizzle / SQL)
**Existence check:**
```bash
# Schema file exists
[ -f "prisma/schema.prisma" ] || [ -f "drizzle/schema.ts" ] || [ -f "src/db/schema.sql" ]
# Model/table is defined
grep -E "^model $model_name|CREATE TABLE $table_name|export const $table_name" "$schema_path"
```
**Substantive check:**
```bash
# Has expected fields (not just id)
grep -A 20 "model $model_name" "$schema_path" | grep -E "^\s+\w+\s+\w+"
# Has relationships if expected
grep -E "@relation|REFERENCES|FOREIGN KEY" "$schema_path"
# Has appropriate field types (not all String)
grep -A 20 "model $model_name" "$schema_path" | grep -E "Int|DateTime|Boolean|Float|Decimal|Json"
```
**Stub patterns specific to schemas:**
```prisma
// RED FLAGS - These are stubs:
model User {
id String @id
// TODO: add fields
}
model Message {
id String @id
content String // Only one real field
}
// Missing critical fields:
model Order {
id String @id
// No: userId, items, total, status, createdAt
}
```
**Wiring check:**
```bash
# Migrations exist and are applied
ls prisma/migrations/ 2>/dev/null | wc -l # Should be > 0
npx prisma migrate status 2>/dev/null | grep -v "pending"
# Client is generated
[ -d "node_modules/.prisma/client" ]
```
**Functional verification:**
```bash
# Can query the table (automated)
npx prisma db execute --stdin <<< "SELECT COUNT(*) FROM $table_name"
```
</database_schema>
<hooks_utilities>
## Custom Hooks and Utilities
**Existence check:**
```bash
# File exists and exports function
[ -f "$hook_path" ] && grep -E "export (default )?(function|const)" "$hook_path"
```
**Substantive check:**
```bash
# Hook uses React hooks (for custom hooks)
grep -E "useState|useEffect|useCallback|useMemo|useRef|useContext" "$hook_path"
# Has meaningful return value
grep -E "return \{|return \[" "$hook_path"
# More than trivial length
[ $(wc -l < "$hook_path") -gt 10 ]
```
**Stub patterns specific to hooks:**
```typescript
// RED FLAGS - These are stubs:
export function useAuth() {
return { user: null, login: () => {}, logout: () => {} }
}
export function useCart() {
const [items, setItems] = useState([])
return { items, addItem: () => console.log('add'), removeItem: () => {} }
}
// Hardcoded return:
export function useUser() {
return { name: "Test User", email: "test@example.com" }
}
```
**Wiring check:**
```bash
# Hook is actually imported somewhere
grep -r "import.*$hook_name" src/ --include="*.tsx" --include="*.ts" | grep -v "$hook_path"
# Hook is actually called
grep -r "$hook_name()" src/ --include="*.tsx" --include="*.ts" | grep -v "$hook_path"
```
</hooks_utilities>
<environment_config>
## Environment Variables and Configuration
**Existence check:**
```bash
# .env file exists
[ -f ".env" ] || [ -f ".env.local" ]
# Required variable is defined
grep -E "^$VAR_NAME=" .env .env.local 2>/dev/null
```
**Substantive check:**
```bash
# Variable has actual value (not placeholder)
grep -E "^$VAR_NAME=.+" .env .env.local 2>/dev/null | grep -v "your-.*-here|xxx|placeholder|TODO" -i
# Value looks valid for type:
# - URLs should start with http
# - Keys should be long enough
# - Booleans should be true/false
```
**Stub patterns specific to env:**
```bash
# RED FLAGS - These are stubs:
DATABASE_URL=your-database-url-here
STRIPE_SECRET_KEY=sk_test_xxx
API_KEY=placeholder
NEXT_PUBLIC_API_URL=http://localhost:3000 # Still pointing to localhost in prod
```
**Wiring check:**
```bash
# Variable is actually used in code
grep -r "process\.env\.$VAR_NAME|env\.$VAR_NAME" src/ --include="*.ts" --include="*.tsx"
# Variable is in validation schema (if using zod/etc for env)
grep -E "$VAR_NAME" src/env.ts src/env.mjs 2>/dev/null
```
</environment_config>
<wiring_verification>
## Wiring Verification Patterns
Wiring verification checks that components actually communicate. This is where most stubs hide.
### Pattern: Component → API
**Check:** Does the component actually call the API?
```bash
# Find the fetch/axios call
grep -E "fetch\(['\"].*$api_path|axios\.(get|post).*$api_path" "$component_path"
# Verify it's not commented out
grep -E "fetch\(|axios\." "$component_path" | grep -v "^.*//.*fetch"
# Check the response is used
grep -E "await.*fetch|\.then\(|setData|setState" "$component_path"
```
**Red flags:**
```typescript
// Fetch exists but response ignored:
fetch('/api/messages') // No await, no .then, no assignment
// Fetch in comment:
// fetch('/api/messages').then(r => r.json()).then(setMessages)
// Fetch to wrong endpoint:
fetch('/api/message') // Typo - should be /api/messages
```
### Pattern: API → Database
**Check:** Does the API route actually query the database?
```bash
# Find the database call
grep -E "prisma\.$model|db\.query|Model\.find" "$route_path"
# Verify it's awaited
grep -E "await.*prisma|await.*db\." "$route_path"
# Check result is returned
grep -E "return.*json.*data|res\.json.*result" "$route_path"
```
**Red flags:**
```typescript
// Query exists but result not returned:
await prisma.message.findMany()
return Response.json({ ok: true }) // Returns static, not query result
// Query not awaited:
const messages = prisma.message.findMany() // Missing await
return Response.json(messages) // Returns Promise, not data
```
### Pattern: Form → Handler
**Check:** Does the form submission actually do something?
```bash
# Find onSubmit handler
grep -E "onSubmit=\{|handleSubmit" "$component_path"
# Check handler has content
grep -A 10 "onSubmit.*=" "$component_path" | grep -E "fetch|axios|mutate|dispatch"
# Verify not just preventDefault
grep -A 5 "onSubmit" "$component_path" | grep -v "only.*preventDefault" -i
```
**Red flags:**
```typescript
// Handler only prevents default:
onSubmit={(e) => e.preventDefault()}
// Handler only logs:
const handleSubmit = (data) => {
console.log(data)
}
// Handler is empty:
onSubmit={() => {}}
```
### Pattern: State → Render
**Check:** Does the component render state, not hardcoded content?
```bash
# Find state usage in JSX
grep -E "\{.*messages.*\}|\{.*data.*\}|\{.*items.*\}" "$component_path"
# Check map/render of state
grep -E "\.map\(|\.filter\(|\.reduce\(" "$component_path"
# Verify dynamic content
grep -E "\{[a-zA-Z_]+\." "$component_path" # Variable interpolation
```
**Red flags:**
```tsx
// Hardcoded instead of state:
return <div>
<p>Message 1</p>
<p>Message 2</p>
</div>
// State exists but not rendered:
const [messages, setMessages] = useState([])
return <div>No messages</div> // Always shows "no messages"
// Wrong state rendered:
const [messages, setMessages] = useState([])
return <div>{otherData.map(...)}</div> // Uses different data
```
</wiring_verification>
<verification_checklist>
## Quick Verification Checklist
For each artifact type, run through this checklist:
### Component Checklist
- [ ] File exists at expected path
- [ ] Exports a function/const component
- [ ] Returns JSX (not null/empty)
- [ ] No placeholder text in render
- [ ] Uses props or state (not static)
- [ ] Event handlers have real implementations
- [ ] Imports resolve correctly
- [ ] Used somewhere in the app
### API Route Checklist
- [ ] File exists at expected path
- [ ] Exports HTTP method handlers
- [ ] Handlers have more than 5 lines
- [ ] Queries database or service
- [ ] Returns meaningful response (not empty/placeholder)
- [ ] Has error handling
- [ ] Validates input
- [ ] Called from frontend
### Schema Checklist
- [ ] Model/table defined
- [ ] Has all expected fields
- [ ] Fields have appropriate types
- [ ] Relationships defined if needed
- [ ] Migrations exist and applied
- [ ] Client generated
### Hook/Utility Checklist
- [ ] File exists at expected path
- [ ] Exports function
- [ ] Has meaningful implementation (not empty returns)
- [ ] Used somewhere in the app
- [ ] Return values consumed
### Wiring Checklist
- [ ] Component → API: fetch/axios call exists and uses response
- [ ] API → Database: query exists and result returned
- [ ] Form → Handler: onSubmit calls API/mutation
- [ ] State → Render: state variables appear in JSX
</verification_checklist>
<automated_verification_script>
## Automated Verification Approach
For the verification subagent, use this pattern:
```bash
# 1. Check existence
check_exists() {
[ -f "$1" ] && echo "EXISTS: $1" || echo "MISSING: $1"
}
# 2. Check for stub patterns
check_stubs() {
local file="$1"
local stubs=$(grep -c -E "TODO|FIXME|placeholder|not implemented" "$file" 2>/dev/null || echo 0)
[ "$stubs" -gt 0 ] && echo "STUB_PATTERNS: $stubs in $file"
}
# 3. Check wiring (component calls API)
check_wiring() {
local component="$1"
local api_path="$2"
grep -q "$api_path" "$component" && echo "WIRED: $component$api_path" || echo "NOT_WIRED: $component$api_path"
}
# 4. Check substantive (more than N lines, has expected patterns)
check_substantive() {
local file="$1"
local min_lines="$2"
local pattern="$3"
local lines=$(wc -l < "$file" 2>/dev/null || echo 0)
local has_pattern=$(grep -c -E "$pattern" "$file" 2>/dev/null || echo 0)
[ "$lines" -ge "$min_lines" ] && [ "$has_pattern" -gt 0 ] && echo "SUBSTANTIVE: $file" || echo "THIN: $file ($lines lines, $has_pattern matches)"
}
```
Run these checks against each must-have artifact. Aggregate results into VERIFICATION.md.
</automated_verification_script>
<human_verification_triggers>
## When to Require Human Verification
Some things can't be verified programmatically. Flag these for human testing:
**Always human:**
- Visual appearance (does it look right?)
- User flow completion (can you actually do the thing?)
- Real-time behavior (WebSocket, SSE)
- External service integration (Stripe, email sending)
- Error message clarity (is the message helpful?)
- Performance feel (does it feel fast?)
**Human if uncertain:**
- Complex wiring that grep can't trace
- Dynamic behavior depending on state
- Edge cases and error states
- Mobile responsiveness
- Accessibility
**Format for human verification request:**
```markdown
## Human Verification Required
### 1. Chat message sending
**Test:** Type a message and click Send
**Expected:** Message appears in list, input clears
**Check:** Does message persist after refresh?
### 2. Error handling
**Test:** Disconnect network, try to send
**Expected:** Error message appears, message not lost
**Check:** Can retry after reconnect?
```
</human_verification_triggers>
<checkpoint_automation_reference>
## Pre-Checkpoint Automation
For automation-first checkpoint patterns, server lifecycle management, CLI installation handling, and error recovery protocols, see:
**@.pi/gsd/references/checkpoints.md** → `<automation_reference>` section
Key principles:
- the agent sets up verification environment BEFORE presenting checkpoints
- Users never run CLI commands (visit URLs only)
- Server lifecycle: start before checkpoint, handle port conflicts, keep running for duration
- CLI installation: auto-install where safe, checkpoint for user choice otherwise
- Error handling: fix broken environment before checkpoint, never present checkpoint with failed setup
</checkpoint_automation_reference>

View File

@@ -0,0 +1,58 @@
# Workstream Flag (`--ws`)
## Overview
The `--ws <name>` flag scopes GSD operations to a specific workstream, enabling
parallel milestone work by multiple Claude Code instances on the same codebase.
## Resolution Priority
1. `--ws <name>` flag (explicit, highest priority)
2. `GSD_WORKSTREAM` environment variable (per-instance)
3. `.planning/active-workstream` file (shared, last-writer-wins)
4. `null` - flat mode (no workstreams)
## Routing Propagation
All workflow routing commands include `${GSD_WS}` which:
- Expands to `--ws <name>` when a workstream is active
- Expands to empty string in flat mode (backward compatible)
This ensures workstream scope chains automatically through the workflow:
`new-milestone → discuss-phase → plan-phase → execute-phase → transition`
## Directory Structure
```
.planning/
├── PROJECT.md # Shared
├── config.json # Shared
├── milestones/ # Shared
├── codebase/ # Shared
├── active-workstream # Points to current ws
└── workstreams/
├── feature-a/ # Workstream A
│ ├── STATE.md
│ ├── ROADMAP.md
│ ├── REQUIREMENTS.md
│ └── phases/
└── feature-b/ # Workstream B
├── STATE.md
├── ROADMAP.md
├── REQUIREMENTS.md
└── phases/
```
## CLI Usage
```bash
# All gsd-tools commands accept --ws
node gsd-tools.cjs state json --ws feature-a
node gsd-tools.cjs find-phase 3 --ws feature-b
# Workstream CRUD
node gsd-tools.cjs workstream create <name>
node gsd-tools.cjs workstream list
node gsd-tools.cjs workstream status <name>
node gsd-tools.cjs workstream complete <name>
```