The Four Stages

Context → Plan → Build → Eval.

infrastructure layer

SkillsScheduled TasksRunbooks

the four stages, looping

Most of my time is in Context and Eval. Claude does most of the work in Build. Plan is where the real thinking happens. Claude will give you a plan — don't just auto-accept. Read through it, understand it, and if it's not right, revise.

1. Context

Your PM skills still matter. Maybe more than before.

Before I type anything into Claude, I drop the PM artifacts into the project folder:

Customer context — who are they, what do they do, what's their day look like
Personas — not the marketing kind, the kind that matters for product decisions
JTBD — what job are they hiring this thing to do
Pain points — the specific moments where today's workflow breaks
Data model — entities, relationships, what the data actually looks like
Hard constraints — what the app must not do (compliance, scope, dependencies)

These go into a CLAUDE.md at the project root and a /docs/ folder with specs/, design/, architecture/, and context/ subfolders. Claude reads all of it. This is the difference between a generic prototype and one that actually reflects the problem.

The data model is the single biggest thing that takes a project from a toy to something with a path to production. If I had to pick one file to get right, it'd be the entity reference — a table with every entity, its key properties, and what it's used for.

If you skip this stage, you get a demo that looks nice and falls apart the moment someone asks "what about X?"

2. Plan

Don't just type "build me an app."

I start every new build in Plan Mode. This is Claude's read-only mode where it explores the codebase, asks questions, and writes a plan file before touching any code.

What I actually do here:

Brainstorm the workflow with Claude. Talk through the user journey. Ask it to challenge assumptions.
Be critical of the first plan. It's almost never right. Push back. Ask it to refine. You can comment on the plan markdown file in Claude Code directly — something I found useful.
Check that the plan references existing files and patterns, not made-up ones.
Keep iterating until the plan actually makes sense. Systems thinking first.

Plan mode is where I catch the "oh wait, we don't need to build that" or "that does not make sense" moments. Time spent here pays back 10x in Build.

If you skip this stage or accept everything Claude recommends, Claude will happily build the wrong thing fast.

3. Build

Switch to Bypass Permissions. It sounds scary. It isn't, for solo prototypes.

Once the plan is solid, I flip to Bypass Permissions so Claude can run without stopping to ask me to approve every file change. For prototypes on my own machine, nothing catastrophic happens. I'll let you know if I ever get burned.

During Build, I try to stay out of the way. My job here is:

Keep Claude on the plan. If it drifts, pull it back.
Watch for scope creep. Don't let it add features "while it's in there."
If it hits a real blocker, pair on it. Otherwise, let it run.

I ship V0 with explicit phase rules:

Hardcode what you can. Stub what you must.
Every screen needs realistic data — real-ish companies, plausible metrics. If you have real data (PII removed), great — use it. If you don't, think about the persona and the company you're building for, and let Claude do the research to ground the data and use case.
Build modules one at a time. If you try to build the entire app in one shot, you'll get mediocre results. Create a shell with the scaffolding you need, and then one core module at a time.

The initial app comes together in surprisingly decent shape when the plan is good.

4. Eval

Start with what you know, review the output, keep refining. That's the job.

If the app involves an LLM or agent, eval is not optional. It's the whole job.

I define "good" on two axes:

Data structure — does the output match the schema? Are required fields there? Are types right?
Output quality — is the content actually useful? Is it specific? Does it match the rules I laid down?

Day one, the eval is a spreadsheet and my eyeballs. That's fine. Over time it turns into:

Layer 1: Schema validation — automated, binary, zero errors tolerated
Layer 2: Content quality — LLM-as-judge scoring with rubrics
Layer 3: Cross-item consistency — does the same input produce similar outputs
Layer 4: Coverage — does the output hit all the categories it should

The thresholds are grounded in product requirements, not in what the model happens to produce on sample data. If a real user needs the output to be 95% correct, that's the bar. Not "85% because that's where we landed."

The Infrastructure Layer

Underneath the four stages, there's a layer I didn't set out to build but ended up needing:

Skills — reusable capabilities I call on over and over (PRD generator, JTBD extractor, persona writer, competitive analysis)
Scheduled tasks — recurring automation (daily knowledge digest, weekly content distillation, folder maintenance) that reads state, does work, and logs feedback so the next run is smarter
Runbooks — when a system needs many steps and consistent results, ask Claude to create a runbook. There are fancier ways to do it, but if everything is already in your project folder, a runbook with specific steps and checks does the trick.

This is what makes Claude Code feel less like a tool and more like a workspace. Not everything needs to be a skill — a one-off is a one-off. But when I catch myself doing the same thing twice, I write a skill for it.

What I'd Tell a Friend Starting Today

Pick one small real project. Not a toy, not a big bet. Something you'd actually use.
Spend an hour on Context before you spend a minute on code.
Start in Plan Mode. Argue with the first plan.
Let Claude build. Stay out of the way.
Review the output. Write down what's wrong. Feed it back.
When you do the same thing twice, make it a skill.

Your reasoning skills are the moat. AI collapses the time from idea to prototype, but it doesn't decide what to build or what "good" means. That part is still yours.