Start here
00 — The Coin Flip
Two engineers open Claude Code on Monday morning. They have the same model weights, the same token budget, the same prompt. One ships a feature by lunch. The other spends the day cleaning up the mess Claude made. Same model. Different output. Why?
Imagine you're a senior engineer with a half-million-line TypeScript codebase — a FastAPI backend, a Next.js frontend, a sidebar you don't dare touch because nobody remembers who wrote it. You open Claude Code on Monday morning and type six words into the prompt box:
"add a notes feature"
Six words. The cursor blinks. You wait.
Now imagine your teammate. Same Monday. Same codebase. Same prompt. Different outcome.
Engineer A (vibe coding) gets:
- A new FastAPI route at
/api/notes that ignores the existing routes/todos.py patterns - A standalone Next.js page that bypasses the sidebar navigation entirely
- Zero tests — not even a
test_notes.py to match the test_todos.py convention - An inconsistent response shape — every other endpoint returns
{ data, error }, this one returns a bare array - Three files modified in the backend, two orphaned files in the frontend, no commit
Engineer B (agentic engineering) gets:
- A new FastAPI route that follows the
routes/todos.py pattern: same router factory, same Pydantic model location, same auth dependency - A new page integrated into the existing sidebar — slot reserved, icon reused, route registered
- A
test_notes.py that mirrors the test_todos.py style: same fixtures, same assertion style - A
{ data, error } response envelope to match every other endpoint - One commit, clean diff, all tests green
Same prompt. Same model. Two completely different outputs.
This isn't a benchmark problem. This isn't a model-capability problem. This is a wrapper problem.
The model is the same; the wrapper isn't
I've watched this asymmetry play out for two years. The first time I noticed, I blamed the model. *Maybe the new release regressed,* I thought. *Maybe the route was too vague.* The standard excuses.
Then a colleague sent me a before/after screenshot of his team's monorepo. Same Claude Code. Same prompt. Same engineer — him. The difference was that two months earlier, he had started writing files in a directory he hadn't known existed: .claude/.
That's when it clicked. The model weights were frozen. The harness wasn't.
Naming the zero state
There's a name for what Engineer A is doing: vibe coding.
todoapp/ # vibe coding (zero state)
├── backend/
│ ├── main.py
│ ├── routes/
│ ├── models/
│ └── tests/
└── frontend/
├── components/
├── pages/
└── lib/
That's the entire repository. No .claude/. No CLAUDE.md. No skills, no agents, no commands. Just code — code that Claude has to discover on every prompt, file by file, search by search, guess by guess.
Every prompt in this state is a coin flip. Sometimes Claude guesses right. Sometimes it invents a pattern that doesn't exist. Sometimes it produces something beautiful. Sometimes it produces something you'll be cleaning up at 11pm.
The state is the same as if you had no project at all. Each session starts from zero. Each prompt is a fresh roll of the dice.
todoapp/ # agentic engineering (destination)
├── .claude/ # Claude Code config
│ ├── agents/ # Custom subagents
│ ├── skills/ # Domain knowledge
│ ├── commands/ # Slash commands
│ ├── hooks/ # Lifecycle scripts
│ ├── rules/ # Modular instructions
│ ├── settings.json # Team settings
│ └── settings.local.json # Personal settings
├── backend/
│ └── CLAUDE.md # Backend instructions
├── frontend/
│ └── CLAUDE.md # Frontend instructions
├── .mcp.json # Managed MCP servers
└── CLAUDE.md # Project instructions
That's the destination. Same backend/, same frontend/, same routes/, same tests/. The codebase didn't change. **The wra
7m / Article + audio
01 — The Wrapper Does the Heavy Lifting
Your teammate opens the same project, types the same prompt, gets a different endpoint. You open your terminal to debug. The endpoint works perfectly. Both of you used Claude Code. Both of you got Claude's "best effort." So why are the two best efforts different?
I used to think the wrapper around Claude Code was theatre.
A skill is just a prompt. A command is just a prompt. A subagent is just a prompt with extra steps. Eventually all of it gets reduced to tokens the model sees, and a strong prompt alone should be enough.
This is the most common reduction among experienced Claude Code users, and it is wrong in a way that matters. Not in a subtle, philosophical way. In a *your output is wrong by an order of magnitude* way.
The reduction is right about one thing: at the moment of inference, the model only sees tokens. That's it. No magic, no hidden state. Tokens in, tokens out.
But that's not the layer where engineering happens. The engineering happens before tokens arrive, after tokens are produced, across sessions, and between contexts. At every one of those layers, the harness is doing work a prompt cannot do — and the most expensive mistake you can make as a practitioner is to mistake the model's view of the world for the system's view of the world.
Let me show you what I mean.
Six tokens
You sit down at your keyboard on a Tuesday afternoon. You open Claude Code in a half-million-line TypeScript monorepo. You type:
add a notes feature
Six tokens. You hit enter.
Now think about what you think Claude sees.
If you're like most engineers I talk to, the answer is some version of: "Claude sees my six tokens, plus maybe a CLAUDE.md if we have one, plus whatever files it reads along the way."
That mental model is roughly 5% of the story.
What the model actually sees
The harness assembles something called the effective context — the actual token stream that reaches the model at inference time. For a six-token user prompt in Claude Code, the effective context looks like this:
flowchart TD
A["User prompt<br/>'add a notes feature'<br/>~6 tokens"] --> B[Effective context at inference<br/>~5,000–50,000+ tokens]
B --> C1["CLAUDE.md<br/>(project conventions)"]
B --> C2[".claude/rules/*.md<br/>(lazy-loaded via paths: frontmatter)"]
B --> C3["Modular system prompt<br/>(110+ fragments, conditionally loaded)"]
B --> C4["Tool definitions<br/>(Read, Write, Bash, Agent, etc.)"]
B --> C5["Environment context<br/>(cwd, git status, platform)"]
B --> C6["Prior turn history"]
B --> C7["Files read via Read/Grep"]
B --> C8["User's 6-token request"]
Notice what's missing from your mental model. The CLAUDE.md is obvious. But the harness also injects matching rules from .claude/rules/ based on which files Claude is touching. It loads modular system prompt fragments — over a hundred of them, conditionally — that you cannot hand-author or swap in. It prepends tool definitions so the model knows what tools it can call. It appends environment context so the model knows what platform it's on. It carries forward every prior turn. It reads files via tools as the agentic loop progresses.
The user types 6 tokens. The model sees somewhere between 5,000 and 50,000 tokens at inference, most of which the user did not write and cannot directly see.
Output quality is a function of the effective context. Output quality is *not* a function of your typed prompt.
That's the asymmetry, and it's not subtle.
The two meanings of "prompt"
Here's where the reduction collapses. The word *prompt* is doing two jobs in everyday speech:
| Meaning | Who controls it | Typical size | |---|---|---| | (a) What the user typed | You | ~6–60 tokens | | (b) What the model sees at inference | The harness | ~5,000–50,000+ tokens |
In a chatbot, (a) and (b) are the same thing. In Claude Code, they are radically different.
A "strong prompt" — meaning, a strong (a) — controls a sliver of (b). The harness constructs the rest. The harness's job is precisely to make (b) much richer than (a).
If you confuse (a) with (b), you'll spend your time rewriting the same six tokens, trying to outflank the model. You won't. The model isn't the bottleneck. The wrapper around the model is the bottleneck — and the wrapper is what you should be writing.
What prompts cannot do
There are ten architectural capabilities of the harness that operate at layers where prompts have no acc
8m / Article + audio
02 — Project Memory: The Most Important File You Haven't Written
It's Monday morning. Your standup asks how you'll onboard the new hire to your monorepo. You pause. The honest answer is: you'll shadow them for a week, point them at five different Notion pages, and hope they absorb the conventions by osmosis. Meanwhile, your Claude Code sessions start every morning from zero — the same questions, the same patterns re-explained, the same mistakes re-discovered. The new hire and Claude have something in common: neither of them gets to keep your context.
There's a paradox at the heart of every Claude Code project that hasn't written a CLAUDE.md.
Claude has perfect recall — until the session ends. Every instruction you've ever typed, every correction, every "no, we don't do it that way" — all of it vanishes the moment you close the terminal. Next session, Claude doesn't know about the auth pattern you spent twenty minutes explaining last Tuesday. Next month, the new hire opens Claude Code, asks for a feature, and Claude invents a convention that doesn't exist — because it has no memory of the conventions you spent years building.
A memory that forgets itself isn't a memory. It's a conversation.
The fix is one file. It's been there the entire time. Most teams still haven't written it.
The file
CLAUDE.md is a markdown file at the root of your repository. When Claude Code starts, it walks the directory tree, finds CLAUDE.md files, and injects them into the model's effective context. Whatever you put in those files is what Claude knows about your project — before the user types a single token.
This is the single most impactful file you can write. Not the most impactful prompt. Not the most impactful skill. The most impactful *file*. Period.
your-repo/
├── CLAUDE.md ← project-level conventions (always loaded)
├── backend/
│ └── CLAUDE.md ← backend-specific conventions (lazy-loaded)
└── frontend/
└── CLAUDE.md ← frontend-specific conventions (lazy-loaded)
The contents aren't magic. They're prose. They describe what the codebase is, what the conventions are, what Claude should and shouldn't do.
A good CLAUDE.md has:
- Project overview — what the codebase is, in 2-4 sentences
- Tech stack — language, framework, key dependencies
- Conventions — naming, file organization, test style, commit format
- Architecture — how the major pieces fit together
- Don'ts — common mistakes, deprecated patterns, things Claude should avoid
- Pointers — where to look for deeper documentation
That's it. No magic incantations. No specific structure Claude requires. Just clear, opinionated prose that tells the model what this project is.
Humanlayer's guide to writing a good CLAUDE.md (humanlayer.dev/blog/writing-a-good-claude-md) is the canonical reference. Boris Cherny's recommendation, documented in the Claude Code repository's own CLAUDE.md, is to keep each CLAUDE.md under 200 lines for reliable adherence. The presentation framework in this repo suggests aiming for 150 lines or fewer. Anything longer and the model starts losing track of the lower-priority rules — the model has a bounded attention budget over long context just like a human does.
The day the file lands
Here's what changes when you finally write CLAUDE.md.
Before (Monday, week one):
You open Claude Code on your half-million-line monorepo. You start typing:
"Before you make any changes, read routes/todos.py to understand our backend conventions. The response envelope is always {data, error}. Auth uses the require_user dependency. Tests mirror test_todos.py style with the same fixtures. The sidebar lives in components/Sidebar.tsx and follows the slot pattern — never bypass it. Commit messages follow conventional-commits format..."
Six hundred tokens of context. Every prompt. Every session.
After (Monday, week two):
You open Claude Code. CLAUDE.md says all of this. You type:
"add a notes feature"
Six tokens. The result matches the conventions because the conventions are in CLAUDE.md — not in your prompt.
The wrapper carries the convention. The convention carries across the team. The team's prompts get better not because anyone rewrote them but because the wrapper grew.
How CLAUDE.md loads in a monorepo
The loading rules are non-obvious. Most engineers — me included — got them wrong for years.
Claude Code uses two distinct mechanisms for loading CLAUDE.md files:
graph TD
Start([cd into monorepo root<br/>run 'claude']) --> A[Walk UP directory tree<br/>ancestor loading]
A --> B[Load every CLAUDE.md<br/>along the path immediately]
B --> C[Effective context<br/>now contains root + every ancestor CLAUDE.md]
Start --> D[Subdirectory CLAUDE.md<br/>NOT loaded at startup]
D --> E[Wait for Claude to<br/>read/edit files in subdirectory]
E --> F[Lazy-load that subdirectory's<br/>CLAUDE.md when matched]
Ancestor loading (UP): Claude walks *upward* from your current working directory toward the filesystem root and loads every CLAUDE.md it finds. Loaded **
9m / Article + audio
Premium chapters
4. The_Three_PrimitivesAvailable after upgrade / 8m
5. Practice_Makes_Claude_PerfectAvailable after upgrade / 10m