Start here
Why Your Coding Agent Forgets Your Design
A coding agent does not have a design system. It has a slot where one might be. DESIGN.md is a format for filling that slot, and the reason it exists is that the slot was empty.
Key Takeaways
- "Modern, clean" is not a specification. It is a region in design space, and a model will average over that region differently every time.
- The failure mode of vibe-coded UI is not bad code. It is a missing persistent anchor between sessions, between agents, and between humans.
- DESIGN.md's load-bearing bet is that intent — written in prose — is what survives across sessions, and token values only secondarily.
- Reading this series will give you a working theory of why AI-generated design drifts, what to do about it, and the specific shape of one proposed solution.
The Drift
Imagine you paste the same prompt into three fresh ChatGPT windows on Monday morning.
"Build a React dashboard for tracking SaaS metrics. Modern, clean, premium, trustworthy."
You get back three apps. The first uses a navy primary, the second a forest green, the third a near-black. The first has 8px rounded corners; the second has 4px; the third is sharp. The first is set in Inter, the second in Plus Jakarta Sans, the third in the model default. None of them are bad. None of them are the same.
I have watched this happen at four different companies. The pattern is the same every time. By the third session, the team has stopped asking the agent to "match what it did before" because the answer is always the same: it cannot. The model's context window is finite. The system prompt is general. The user prompt is one line. None of these things carry the specific design intent from one Tuesday to the next Tuesday.
This is the problem DESIGN.md exists to solve. It is a format specification published by Google's google-labs-code organization, currently at version alpha, that gives coding agents a persistent, structured document describing a visual identity. The README is one line about what it is, and one line about what it is for:
"A format specification for describing a visual identity to coding agents. DESIGN.md gives agents a persistent, structured understanding of a design system."
The shape of that persistent document is what we will spend the next four chapters on. But the *reason* the document has to exist at all — and the reason it has the specific shape it does — is what this chapter is about.
The Failure Mode Is Not In The Model
It is tempting to read the drift problem as a model limitation. The model is not, after all, the same model across sessions; the weights may have been retrained, the inference temperature may differ, the system prompt may be different. There is a real sense in which "the same prompt" is never quite the same prompt.
But the model is not where the variance comes from. The variance comes from the *specification*. A prompt that says "modern, clean" is specifying a region of design space, not a point. A region contains many possible points. The model picks a different point each time, and each pick is internally consistent — each is a reasonable interpretation of "modern, clean." The problem is that the user wanted a *single* point, and the prompt never named it.
This is a category of error that shows up across AI tooling, not just in design. The same failure mode appears in code generation ("write a function that handles errors well"), in data analysis ("clean this dataset"), in product copy ("write a friendly error message"). The model is doing the right thing. The specification is doing the wrong thing. The fix is to write a better specification, not to ask the model to be more deterministic.
I started this analysis expecting the answer to be a model-side trick — a temperature setting, a system prompt template, a fine-tuned checkpoint. The DESIGN.md repository moved
7m / Article + audio
Two Layers, One Document
A DESIGN.md file is two normative documents welded together at the top of a Markdown file. The two halves do categorically different work, and the format's most important rule is that they are not interchangeable.
Key Takeaways
- A DESIGN.md file is a YAML front matter (design tokens) plus a Markdown body (design rationale), separated by
--- fences. Both halves are normative; they do not duplicate each other. - The YAML holds values the model cannot infer (
#1A1C1E, Public Sans, 8px). The Markdown holds intent the values cannot express ("architectural minimalism meets journalistic gravitas"). - The Markdown body is sectioned in a fixed canonical order — Overview, Colors, Typography, Layout, Elevation & Depth, Shapes, Components, Do's and Don'ts — and the linter warns when sections are out of order.
- The two layers do not collapse into one. The values are *not* the spec, the prose is. This is the single most surprising rule in the format.
The Document, As It Actually Looks
Open any of the three example files in the repository — say examples/atmospheric-glass/DESIGN.md — and the first thing that strikes you is the structure. A real opening:
---
name: Atmospheric Glass
colors:
surface: "#0b1326"
surface-dim: "#0b1326"
surface-bright: "#31394d"
primary: "#ffffff"
on-primary: "#2f3131"
secondary: "#adc9eb"
...
typography:
display-lg:
fontFamily: Inter
fontSize: 84px
fontWeight: "700"
lineHeight: 90px
letterSpacing: -0.04em
...
rounded:
sm: 0.25rem
md: 0.75rem
lg: 1rem
xl: 1.5rem
full: 9999px
---
## Brand & Style
This design system centers on a high-fidelity Glassmorphism aesthetic...
## Colors
The color strategy prioritizes luminosity and contrast...
The --- fences are not Markdown syntax. They delimit a YAML block. The parser, when it sees those fences, switches from Markdown to YAML, reads the tokens, and switches back. Two parsers in the same file, with a one-character trigger.
I want to dwell on this for a moment, because the *single file with two parsers* is not a formatting choice. It is a normative claim. The format's authors are saying: these two kinds of information belong in one document, and they belong in one document because they are consumed together. An agent that reads this file will read both halves. A linter that validates this file will check both halves. A diff that compares two versions of this file will report changes in both halves. The two layers share a lifecycle.
What The YAML Layer Is For
The YAML front matter is, at first glance, the obvious half. It is a structured data file. It is a Map<String, Value> that any parser can read. It is the part of the format that looks like a config file. And it is the part the format spec spends the most pages defining.
The schema is published in docs/spec.md and is generated from linter/spec-config.yaml. Five top-level groups:
version: <string> # optional, current: "alpha"
name: <string>
description: <string> # optional
colors:
<token-name>: <Color>
typography:
<token-name>: <Typography>
rounded:
<scale-level>: <Dimension>
spacing:
<scale-level>: <Dimension | number>
components:
<component-name>:
<token-name>: <string | token reference>
Color values are any valid CSS color string. Hex (#1A1C1E), rgb(), oklch(), named, color-mix() — the parser accepts all of them and internally converts to sRGB for the WCAG contrast check. Dimensions are px / em / rem with the unit mandatory; the alternative is a unitless number, which the parser treats as a multiplier of fontSize (the recommended CSS practice). Token references use the {path.to.token} syntax, with one important restriction: most references must point to a *primitive* value, not a group. Within components, references to composite values — like an entire typography preset — are allowed.
The schema is small. Five groups, one reference syntax, a fixed set of typography properties (fontFamily, fontSize, fontWeight, lineHeight, letterSpacing, fontFeature, fontVariation), a fixed set of component sub-tokens (backgroundColor, `tex
7m / Article + audio
Five Vocabulary Words and a Resolution Algorithm
The token schema is small on purpose. Five groups, one reference syntax, two depth limits, and a cycle detector. That is the entire machine. The boundedness is not a constraint of the parser — it is a constraint of the format's design philosophy.
Key Takeaways
- The token schema has exactly five top-level groups:
colors, typography, rounded, spacing, components. Each is a small, fixed map. - Token references use one syntax —
{path.to.token} — and the resolver walks a graph with two depth limits (max_reference_depth: 10, max_token_nesting_depth: 20). - References are mostly restricted to primitive values. Composite references (e.g., a whole typography preset) are allowed only inside
components. - The linter treats broken references as errors, not warnings, because a broken reference is a file that the agent cannot faithfully render.
The Reference That Started This Chapter
I was reading examples/paws-and-paths/DESIGN.md when a single line stopped me.
button-primary-hover:
backgroundColor: "{colors.primary-container}"
textColor: "{colors.on-primary-container}"
button-primary-hover is a component. Its backgroundColor is set to the string "{colors.primary-container}". The braces are the entire escape syntax. The linter, when it sees this string, treats it as a token reference, walks the YAML tree, finds colors.primary-container, substitutes the value, and uses the substituted value to run the WCAG contrast check against the resolved textColor on the same component.
That is the entire resolution algorithm. I had assumed the format would be more elaborate than this — schemas usually are. I was wrong. The walk is short, the limits are explicit, and the cycle detection is in the same file. Let me show you what the walk actually does, because once you see it, the smallness of the schema stops feeling like an accident.
The Five Groups, In One View
The schema is defined in linter/spec-config.yaml and rendered into docs/spec.md. Five top-level groups, no more. The recommended-but-not-required names in the spec are exactly what the linter suggests, but the format accepts arbitrary keys.
graph TB
R[DESIGN.md root]
R --> C[colors]
R --> T[typography]
R --> RO[rounded]
R --> SP[spacing]
R --> CO[components]
C --> C1[primary / secondary /<br/>tertiary / neutral /<br/>on-surface / error / ...]
T --> T1[headline-* / body-* /<br/>label-* / display / ...]
RO --> R1[sm / md / lg / xl / full / ...]
SP --> S1[unit / base / xs / sm /<br/>md / lg / xl / gutter / margin / ...]
CO --> CO1[button-primary /<br/>button-primary-hover /<br/>card-profile / ...]
style R fill:#1A1C1E,color:#F7F5F2
style C fill:#2d3449,color:#dae2fd
style T fill:#2d3449,color:#dae2fd
style RO fill:#2d3449,color:#dae2fd
style SP fill:#2d3449,color:#dae2fd
style CO fill:#2d3449,color:#dae2fd
The graph has a fixed shape. Every DESIGN.md file is a tree of this shape (with arbitrary keys at the leaves). Anything that does not fit this shape is either accepted with a warning or rejected with an error, and the rules for that decision are in the linter — not in the schema. The schema is small; the *behavior* of the linter on the schema is where the format's design judgment lives.
The Reference Syntax, In Two Lines
A token reference is a string that begins with {, ends with }, and contains a dotted path. The path is a key-walk through the YAML tree:
{colors.primary} → #1A1C1E
{typography.body-md.fontSize} → 16px
{spacing.md} → 16px
{components.button-primary.backgroundColor} → (a reference, walk continues)
The walk is recursive. If a reference points to another reference, the linter follows. The limits are in linter/spec-config.yaml:
limits:
max_token_nesting_depth: 20
max_reference_depth: 10
Two limits, doing two different jobs. max_token_nesting_depth: 20 is how deep a single reference can reach into the tree (so {a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t} is the deepest legal reference). max_reference_depth: 10 is how many chained references the resolver will follow (so a reference that points to a reference that points to a reference, ten times, is the deepest legal chain).
The cycle detection sits on top of these limits. A reference tha
7m / Article + audio
Premium chapters
4. E03_Specific_References_Beat_AdjectivesAvailable after upgrade / 9m
5. E04_Nine_Rules_Three_ExportersAvailable after upgrade / 10m