Programming /

seven_phases

# From a Domain Sentence to a Working Team in Seven Steps > Harness's seven-phase pipeline is a manufacturing process, not a workflow diagram. Six of its seven phases write files; only one runs agents. That inversion of the usual ratio is the factory's defining move. When you read the SKILL.md of `revfactory/harness` for the first time, the most surprising thing is not any single phase. It is the ratio. The plugin's headline workflow is "Domain Analysis → Team Architecture Design → Agent Definition Generation → Skill Generation → Integration & Orchestration → Validation & Testing." That is six phases. The README mentions six. Most readers stop counting there. The actual phase count is seven. Phase 0 is an audit, Phase 7 is evolution, and they are both load-bearing. The audit phase decides whether you are doing a *new build*, an *extension*, or *maintenance*. The evolution phase decides whether the team survives its first run. The 458-line SKILL.md treats both as non-optional. This chapter is about the shape of that pipeline. The argument I will defend: **the seven-phase pipeline is a design-time assembly line, and the factory's value comes from how much of agent team construction is pushed upstream, into writing files, before any agent ever runs.** ## Key Takeaways - The seven phases split into three design-time phases (0–2), three artifact-generation phases (3–5), and two lifecycle phases (6–7). The lifecycle phases do most of the unseen work. - Phase 3-0 and Phase 4-0 are *deduplication gates* added in the [Unreleased] section of `CHANGELOG.md`. They exist because repeated use of Harness accumulates duplicated agents and skills under different names. - All agents are forced to `model: opus`. This is a hard-coded quality floor — Harness will not let you build a team from a weaker model. - `.claude/commands/` is forbidden. Harness never generates slash commands; the entire plugin produces files under `.claude/agents/` and `.claude/skills/` only. - The plugin's CLAUDE.md integration is *pointer-only* — it records a trigger rule and a change log, never a directory listing. ## The pipeline as a contract ```mermaid flowchart TD A["Phase 0 Audit (read existing .claude/agents, skills, CLAUDE.md)"] B["Phase 1 Domain Analysis (user request, conflict check, skill level detect)"] C["Phase 2 Team Architecture Design (mode: teams/subs/hybrid + 6 patterns)"] D["Phase 3 Agent Definitions (after 3-0 dedup gate, all to .claude/agents/)"] E["Phase 4 Skill Generation (after 4-0 dedup gate, all to .claude/skills/)"] F["Phase 5 Integration & Orchestration (data flow + error handling)"] G["Phase 6 Validation & Testing (with-skill vs without-skill A/B)"] H["Phase 7 Evolution (feedback loop + change log)"] A -->|new build| B A -->|extension| D A -->|maintenance| H B --> C C --> D D --> E E --> F F --> G G --> H H -. feedback .-> A H -. new request .-> B ``` Each phase has an explicit contract. Phase 1 outputs "core task types, conflict analysis, tech stack." Phase 2 outputs "execution mode + architecture pattern + agent separation rationale." Phase 3 outputs ".claude/agents/{name}.md files with required sections." Phase 4 outputs ".claude/skills/{name}/SKILL.md with YAML frontmatter and progressive-disclosure budget." Phase 5 outputs the orchestrator skill plus the CLAUDE.md pointer. Phase 6 outputs structural-validation reports and assertion-based grading. Phase 7 outputs a change-log row in CLAUDE.md. The contracts are not optional. The final 산출물 체크리스트 (deliverable checklist) at the bottom of `SKILL.md` has 16 items; the plugin will not declare a harness "done" until every box is checked. That is the discipline the factory enforces. ## Six phases write files; one runs the team Counting actual code paths, the distribution of effort is heavily skewed toward artifact generation. Phase 0 reads the project. Phase 1 reads and analyzes. Phase 2 is design. Phases 3, 4, and 5 *write files* — agents, skills, the orchestrator, and a two-row CLAUDE.md pointer. Phase 6 runs the new team against test prompts. Phase 7 listens. That ratio — six design-time phases to one runtime phase — is the structural claim the plugin makes. The harness is mostly *not* about running agents. It is about getting the design right so that running agents is straightforward. The reason this matters is that most multi-agent failures are *not* runtime failures. They are design failures: the wrong agent boundaries, the wrong pattern, the wrong execution mode, the wrong description in a skill frontmatter. By the time Phase 6 runs the team, the surface area for those errors has shrunk dramatically. The pipeline has already enforced: - A defined execution mode (Phase 2-1) - A defined pattern (Phase 2-2) - An agent-separation rationale across four axes (Phase 2-3: expertise, parallelism, context, reusability) - A deduplication check against existing agents (Phase 3-0) - A deduplication check against existing skills (Phase 4-0) - A quality floor (`model: opus` on every agent) - A progressive-disclosure budget (skills <500 lines, references o

Chapter 2 of 5 9m Article Learning path

From a Domain Sentence to a Working Team in Seven Steps

Harness's seven-phase pipeline is a manufacturing process, not a workflow diagram. Six of its seven phases write files; only one runs agents. That inversion of the usual ratio is the factory's defining move.

When you read the SKILL.md of revfactory/harness for the first time, the most surprising thing is not any single phase. It is the ratio. The plugin's headline workflow is "Domain Analysis → Team Architecture Design → Agent Definition Generation → Skill Generation → Integration & Orchestration → Validation & Testing." That is six phases. The README mentions six. Most readers stop counting there.

The actual phase count is seven. Phase 0 is an audit, Phase 7 is evolution, and they are both load-bearing. The audit phase decides whether you are doing a *new build*, an *extension*, or *maintenance*. The evolution phase decides whether the team survives its first run. The 458-line SKILL.md treats both as non-optional.

This chapter is about the shape of that pipeline. The argument I will defend: the seven-phase pipeline is a design-time assembly line, and the factory's value comes from how much of agent team construction is pushed upstream, into writing files, before any agent ever runs.

Key Takeaways

The seven phases split into three design-time phases (0–2), three artifact-generation phases (3–5), and two lifecycle phases (6–7). The lifecycle phases do most of the unseen work.
Phase 3-0 and Phase 4-0 are *deduplication gates* added in the [Unreleased] section of CHANGELOG.md. They exist because repeated use of Harness accumulates duplicated agents and skills under different names.
All agents are forced to model: opus. This is a hard-coded quality floor — Harness will not let you build a team from a weaker model.
.claude/commands/ is forbidden. Harness never generates slash commands; the entire plugin produces files under .claude/agents/ and .claude/skills/ only.
The plugin's CLAUDE.md integration is *pointer-only* — it records a trigger rule and a change log, never a directory listing.

The pipeline as a contract

flowchart TD
    A["Phase 0<br/>Audit<br/>(read existing<br/>.claude/agents, skills,<br/>CLAUDE.md)"]
    B["Phase 1<br/>Domain Analysis<br/>(user request,<br/>conflict check,<br/>skill level detect)"]
    C["Phase 2<br/>Team Architecture<br/>Design<br/>(mode: teams/subs/hybrid<br/>+ 6 patterns)"]
    D["Phase 3<br/>Agent Definitions<br/>(after 3-0 dedup gate,<br/>all to .claude/agents/)"]
    E["Phase 4<br/>Skill Generation<br/>(after 4-0 dedup gate,<br/>all to .claude/skills/)"]
    F["Phase 5<br/>Integration &<br/>Orchestration<br/>(data flow +<br/>error handling)"]
    G["Phase 6<br/>Validation &<br/>Testing<br/>(with-skill vs<br/>without-skill A/B)"]
    H["Phase 7<br/>Evolution<br/>(feedback loop +<br/>change log)"]

    A -->|new build| B
    A -->|extension| D
    A -->|maintenance| H
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H -. feedback .-> A
    H -. new request .-> B

Each phase has an explicit contract. Phase 1 outputs "core task types, conflict analysis, tech stack." Phase 2 outputs "execution mode + architecture pattern + agent separation rationale." Phase 3 outputs ".claude/agents/{name}.md files with required sections." Phase 4 outputs ".claude/skills/{name}/SKILL.md with YAML frontmatter and progressive-disclosure budget." Phase 5 outputs the orchestrator skill plus the CLAUDE.md pointer. Phase 6 outputs structural-validation reports and assertion-based grading. Phase 7 outputs a change-log row in CLAUDE.md.

The contracts are not optional. The final 산출물 체크리스트 (deliverable checklist) at the bottom of SKILL.md has 16 items; the plugin will not declare a harness "done" until every box is checked. That is the discipline the factory enforces.

Six phases write files; one runs the team

Counting actual code paths, the distribution of effort is heavily skewed toward artifact generation. Phase 0 reads the project. Phase 1 reads and analyzes. Phase 2 is design. Phases 3, 4, and 5 *write files* — agents, skills, the orchestrator, and a two-row CLAUDE.md pointer. Phase 6 runs the new team against test prompts. Phase 7 listens.

That ratio — six design-time phases to one runtime phase — is the structural claim the plugin makes. The harness is mostly *not* about running agents. It is about getting the design right so that running agents is straightforward.

The reason this matters is that most multi-agent failures are *not* runtime failures. They are design failures: the wrong agent boundaries, the wrong pattern, the wrong execution mode, the wrong description in a skill frontmatter. By the time Phase 6 runs the team, the surface area for those errors has shrunk dramatically. The pipeline has already enforced:

A defined execution mode (Phase 2-1)
A defined pattern (Phase 2-2)
An agent-separation rationale across four axes (Phase 2-3: expertise, parallelism, context, reusability)
A deduplication check against existing agents (Phase 3-0)
A deduplication check against existing skills (Phase 4-0)
A quality floor (model: opus on every agent)
A progressive-disclosure budget (skills <500 lines, references on demand)

Imagine running claude "build a harness for X" — your working directory grows two directories and one orchestrator skill. That is the visible output. The invisible output is the eight to twelve decisions the pipeline has already made and frozen into your project.

Phase 0 — the audit that prevents rework

The audit phase is short to read and expensive to skip. When the user types "build a harness for X" against a project that already contains .claude/agents/, Phase 0 reads every file under .claude/, cross-checks it against the user's request, and routes to one of three branches:

New build (no agents/skills directory) → run all phases from 1.
Extension (existing harness + new request) → consult the Phase Selection Matrix and run only the phases the matrix specifies.
Maintenance ("하네스 점검", "harness audit") → run the Phase 7-5 operations workflow: audit, incremental change, CLAUDE.md sync, change verification.

The Phase Selection Matrix is a six-column table that maps change type (agent add / skill add / architecture change) to the phases that must run. Adding a skill skips Phase 2 entirely. Adding an agent skips Phase 1 and 2 and runs only Phase 3-0, 3, 4 (if it needs a dedicated skill), 5, and 6. The factory's goal here is the smallest possible run.

The reason this matters is empirical: CHANGELOG.md shows that [Unreleased] adds Phase 3-0 and Phase 4-0 as *deduplication gates* before Phase 3 and Phase 4. Before that change, repeated use of the plugin on the same project would quietly accumulate overlapping agents (researcher.md, analyst.md, data-gatherer.md) and overlapping skills (summarize.md, synthesize.md, extract.md). Phase 3-0 forces a check before each new agent is created. Phase 4-0 forces a check before each new skill is created.

This is one of the few places in the source where the factory explicitly admits to a learning cost. The plugin shipped with the dedup problem, observed it in use, and added gates. The CHANGELOG preserves both halves of that lesson.

The hard-coded `model: opus` and the empty commands directory

Two decisions in the spec are worth isolating because they constrain the output sharply.

First, the model parameter. The agent-design-patterns reference is explicit: *"모든 에이전트는 model: "opus"를 사용한다. Agent 도구 호출 시 반드시 model: "opus" 파라미터를 명시한다."* Every agent, every team, every orchestrator call — Opus. The factory does not let you build a team from Sonnet or Haiku. The justification is in the source: *"opus가 최고 품질을 보장한다"* (Opus guarantees the highest quality). Whether that is a defensible trade-off at scale is a question for Chapter 3; what is not negotiable is that the factory enforces it as a hard rule.

Second, the absence of commands. The deliverable checklist contains this item in its negative form: *".claude/commands/ — 아무것도 생성하지 않음"* (.claude/commands/ — do not generate anything). Harness will not produce slash commands. The entire plugin surface is agents and skills. That single line cuts off a category of failure that other agent scaffolding tools allow: the user commands that drift out of sync with the agents they are supposed to invoke.

These two constraints — model floor, command prohibition — are not features. They are the factory's discipline. They are how the assembly line stays consistent across projects.

Phase 5's pointer, not its contents

The CLAUDE.md integration is the most surprising design choice in the pipeline. A user might expect the plugin to write a long CLAUDE.md block listing every agent, every skill, every directory. It does not. The template in Phase 5-4 is two items: a one-line trigger rule and a change-log table.

## 하네스: {도메인명}

**트리거:** {도메인} 관련 작업 요청 시 `{orchestrator-skill-name}` 스킬을 사용하라.

**변경 이력:**
| 날짜 | 변경 내용 | 대상 | 사유 |
|------|----------|------|------|
| {YYYY-MM-DD} | 초기 구성 | 전체 | - |

CLAUDE.md is loaded on every session. If it contained every agent and skill, it would consume tokens for nothing — those files are already discoverable from .claude/agents/ and .claude/skills/. So Harness strips them out and registers only the pointer. Single source of truth, zero duplication.

The change-log table is the second load-bearing element. Every Phase 7 modification writes a row. The table becomes the audit trail for the team's evolution. A new session reads it, sees what changed and why, and starts from there. That is the closest the factory has to a *state machine* across sessions, and it lives entirely in plain Markdown.

Phase 6 — the with-skill vs without-skill A/B

The validation phase is the one runtime-heavy step, and it is more rigorous than the README suggests. The test methodology in references/skill-testing-guide.md instructs the user to:

1. Spawn two agents per test prompt: one with the skill, one without. 2. Capture timing data at the moment the subagent completes (the values are unrecoverable after that point). 3. Define assertions that are *non-discriminating* checks — they must fail on a baseline run, otherwise they prove nothing about the skill's value. 4. Iterate the skill based on the gaps between with-skill and baseline outputs. 5. Run 20 description-trigger evals (10 should-trigger, 10 should-NOT-trigger) on the skill's frontmatter.

The discipline of paired A/B is unusual. Most skill authoring guides recommend smoke tests; Harness requires a comparative test against the no-skill baseline, and warns against assertions that pass on both sides ("non-discriminating"). The plugin's README cites a single A/B study on a sister repo (claude-code-harness, n=15): *"Average Quality Score 49.5 → 79.3, +60%; Win Rate 100% (15/15); Output Variance −32%."* The FAQ is unusually honest that this is *author-measured, third-party replications pending* — and recommends a 2–4 week internal pilot before adoption.

The factory, in other words, does not trust its own claims. It ships a methodology for falsifying them.

Why the pipeline looks the way it does

The seven phases are not an arbitrary partition. They encode a single bet: that multi-agent failures are predominantly design failures, and that pushing decisions upstream into codified phases is cheaper than discovering them at runtime. The bet shows up in the proportions (six design-time, one runtime) and in the hard rules (Opus, no commands, pointer-only CLAUDE.md).

If the next chapter argues that the six patterns are the design vocabulary the factory exposes, this chapter argues that the seven phases are the production line. Patterns are the choices; phases are the discipline. Either alone would be insufficient — patterns without phases produce inconsistent teams, and phases without patterns produce empty teams. The factory's distinctive move is having both.

---

References:

skills/harness/SKILL.md — 7-phase workflow with sub-phases (3-0, 4-0, 5-5, 7-5)
skills/harness/references/agent-design-patterns.md — Execution mode decision tree, agent separation axes, model: opus rule
skills/harness/references/skill-testing-guide.md — With-skill vs baseline A/B methodology, assertion quality rules
CHANGELOG.md — [Unreleased]: Phase 3-0 / 4-0 dedup gates; v1.1.0: Phase 0 audit and Phase 7 evolution

---

If the seven-phase pipeline is so well-defined, why does the plugin forbid slash commands and refuse to write any file under .claude/commands/? The answer is a category decision the factory has made about what it is not, and that decision becomes clearer once you look at the six patterns it *does* ship.