Deepmox / Programming

learning path

revfactory/harness

A meta-skill that designs domain-specific agent teams, defines specialized agents, and generates the skills they use.

5 chapters 0 audio lessons Article-first 3 free previews Fresh topic

Start here

1. introduction

The Factory That Builds Teams

A meta-skill is a skill whose output is more skills. Harness, a Claude Code plugin, is a meta-skill whose output is an entire agent team — agents, skills, orchestrator, change log — generated from one sentence.

A Korean fintech developer types 하네스 구성해줘 — 핀테크 리스크 평가 팀 into Claude Code. Forty-five seconds later, the working directory contains a .claude/agents/ folder with risk-analyst.md, compliance-reviewer.md, and portfolio-monitor.md, a .claude/skills/ folder with their supporting skills, and one orchestrator skill that knows how to route a credit memo through them. The developer did not write any of those files. The plugin did. The developer typed one sentence in Korean, and got a team in return.

That sentence is not a magic spell. It is a trigger that activates a meta-skill — a skill whose output is more skills — running a seven-phase pipeline codified in skills/harness/SKILL.md at version 1.2.0. The pipeline is the point. The plugin's name is harness, its repository is revfactory/harness, and its position in the broader Claude Code ecosystem is something the maintainers have started to call an L3 Meta-Factory — a layer that does not solve problems itself, but generates other layers that do.

This series is about what that means in practice. Not what harness *claims* to do — that is one short README read away — but what its design choices reveal about how multi-agent systems are actually built, versioned, and evolved when you take the category seriously.

What "build a harness" actually means

The revfactory/harness README opens with a striking sentence: *"Harness is a team-architecture factory for Claude Code."* Most readers will skim past that. The phrase looks like product marketing. It is not. It is a category claim, and the rest of the repository is the evidence.

A factory is not a builder. A factory is a process that produces builders — repeatedly, at known cost, against a known specification. Archon, in the same L3 layer of the Claude Code ecosystem, is a *runtime-configuration factory*: you describe a project, and Archon produces a deterministic runtime configuration you can run today. Harness is a *team-architecture factory*: you describe a domain in one sentence, and Harness produces the agents, their skills, their communication protocols, and the orchestrator that binds them. Archon and Harness are neighbors, not competitors. They sit at different sub-layers of L3, and the README is explicit about the boundary: *"Pick Archon for runtime determinism, Harness for team architecture, or combine them."*

The series-level claim I will defend across these chapters: Harness's distinctive contribution is not any single agent pattern or workflow — it is the discipline of treating team design itself as a manufactured artifact. The repo codifies that discipline across seven phases, six patterns, and three execution modes, and it does so with a strict refusal to generate anything else (no commands, no MCP servers, no shared agents — only .claude/agents/ and .claude/skills/ files).

I came into this expecting another agent framework. After reading the source, I had to revise that framing. Frameworks give you primitives; Harness gives you a production line. The distinction matters, because primitives are something you assemble yourself, and a production line is something you specify once and then run.

The three claims the source has to back up

A team-architecture fac

6m / Article + audio

2. seven_phases

From a Domain Sentence to a Working Team in Seven Steps

Harness's seven-phase pipeline is a manufacturing process, not a workflow diagram. Six of its seven phases write files; only one runs agents. That inversion of the usual ratio is the factory's defining move.

When you read the SKILL.md of revfactory/harness for the first time, the most surprising thing is not any single phase. It is the ratio. The plugin's headline workflow is "Domain Analysis → Team Architecture Design → Agent Definition Generation → Skill Generation → Integration & Orchestration → Validation & Testing." That is six phases. The README mentions six. Most readers stop counting there.

The actual phase count is seven. Phase 0 is an audit, Phase 7 is evolution, and they are both load-bearing. The audit phase decides whether you are doing a *new build*, an *extension*, or *maintenance*. The evolution phase decides whether the team survives its first run. The 458-line SKILL.md treats both as non-optional.

This chapter is about the shape of that pipeline. The argument I will defend: the seven-phase pipeline is a design-time assembly line, and the factory's value comes from how much of agent team construction is pushed upstream, into writing files, before any agent ever runs.

Key Takeaways

The seven phases split into three design-time phases (0–2), three artifact-generation phases (3–5), and two lifecycle phases (6–7). The lifecycle phases do most of the unseen work.
Phase 3-0 and Phase 4-0 are *deduplication gates* added in the [Unreleased] section of CHANGELOG.md. They exist because repeated use of Harness accumulates duplicated agents and skills under different names.
All agents are forced to model: opus. This is a hard-coded quality floor — Harness will not let you build a team from a weaker model.
.claude/commands/ is forbidden. Harness never generates slash commands; the entire plugin produces files under .claude/agents/ and .claude/skills/ only.
The plugin's CLAUDE.md integration is *pointer-only* — it records a trigger rule and a change log, never a directory listing.

The pipeline as a contract

flowchart TD
    A["Phase 0<br/>Audit<br/>(read existing<br/>.claude/agents, skills,<br/>CLAUDE.md)"]
    B["Phase 1<br/>Domain Analysis<br/>(user request,<br/>conflict check,<br/>skill level detect)"]
    C["Phase 2<br/>Team Architecture<br/>Design<br/>(mode: teams/subs/hybrid<br/>+ 6 patterns)"]
    D["Phase 3<br/>Agent Definitions<br/>(after 3-0 dedup gate,<br/>all to .claude/agents/)"]
    E["Phase 4<br/>Skill Generation<br/>(after 4-0 dedup gate,<br/>all to .claude/skills/)"]
    F["Phase 5<br/>Integration &<br/>Orchestration<br/>(data flow +<br/>error handling)"]
    G["Phase 6<br/>Validation &<br/>Testing<br/>(with-skill vs<br/>without-skill A/B)"]
    H["Phase 7<br/>Evolution<br/>(feedback loop +<br/>change log)"]

    A -->|new build| B
    A -->|extension| D
    A -->|maintenance| H
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    H -. feedback .-> A
    H -. new request .-> B

Each phase has an explicit contract. Phase 1 outputs "core task types, conflict analysis, tech stack." Phase 2 outputs "execution mode + architecture pattern + agent separation rationale." Phase 3 outputs ".claude/agents/{name}.md files with required sections." Phase 4 outputs ".claude/skills/{name}/SKILL.md with YAML frontmatter and progressive-disclosure budget." Phase 5 outputs the orchestrator skill plus the CLAUDE.md pointer. Phase 6 outputs structural-validation reports and assertion-based grading. Phase 7 outputs a change-log row in CLAUDE.md.

The contracts are not optional. The final 산출물 체크리스트 (deliverable checklist) at the bottom of SKILL.md has 16 items; the plugin will not declare a harness "done" until every box is checked. That is the discipline the factory enforces.

Six phases write files; one runs the team

Counting actual code paths, the distribution of effort is heavily skewed toward artifact generation. Phase 0 reads the project. Phase 1 reads and analyzes. Phase 2 is design. Phases 3, 4, and 5 *write files* — agents, skills, the orchestrator, and a two-row CLAUDE.md pointer. Phase 6 runs the new team against test prompts. Phase 7 listens.

That ratio — six design-time phases to one runtime phase — is the structural claim the plugin makes. The harness is mostly *not* about running agents. It is about getting the design right so that running agents is straightforward.

The reason this matters is that most multi-agent failures are *not* runtime failures. They are design failures: the wrong agent boundaries, the wrong pattern, the wrong execution mode, the wrong description in a skill frontmatter. By the time Phase 6 runs the team, the surface area for those errors has shrunk dramatically. The pipeline has already enforced:

A defined execution mode (Phase 2-1)
A defined pattern (Phase 2-2)
An agent-separation rationale across four axes (Phase 2-3: expertise, parallelism, context, reusability)
A deduplication check against existing agents (Phase 3-0)
A deduplication check against existing skills (Phase 4-0)
A quality floor (model: opus on every agent)
A progressive-disclosure budget (skills <500 lines, references o

9m / Article + audio

3. six_patterns

Six Patterns, Six Failure Modes

The six architecture patterns in Harness — Pipeline, Fan-out/Fan-in, Expert Pool, Producer-Reviewer, Supervisor, Hierarchical Delegation — look interchangeable in the README table. They are not. Each pattern encodes a specific failure mode it is good at preventing, and the wrong choice is the single most expensive error a Harness user can make.

Most readers encounter the six patterns in the README's Architecture Patterns table. It is a tidy, six-row block with one-sentence descriptions. Pipeline is "Sequential dependent tasks." Fan-out/Fan-in is "Parallel independent tasks." Expert Pool is "Context-dependent selective invocation." Producer-Reviewer is "Generation followed by quality review." Supervisor is "Central agent with dynamic task distribution." Hierarchical Delegation is "Top-down recursive delegation."

That table is a vocabulary list. It does not yet tell you which one to pick. To get that, you have to walk into skills/harness/references/agent-design-patterns.md, where the six patterns are unpacked against the failure mode each one is built to defend against.

The argument of this chapter is straightforward: the six patterns are not team shapes. They are team defenses against specific failure modes, with shapes. Treating them as shapes produces wrong choices. Treating them as defenses produces better ones.

Key Takeaways

Pipeline is the only pattern where Harness explicitly notes that team mode is *limited in benefit* — the sequential dependency absorbs the advantage of shared task lists.
Fan-out/Fan-in is the most team-mode-natural pattern. The spec says: *"반드시 에이전트 팀으로 구성해야 한다"* (you must compose it as an agent team).
Producer-Reviewer is the only pattern where Harness enforces a retry cap (2–3 iterations). Without it, the loop is infinite.
Hierarchical Delegation is capped at two levels. Three or more levels costs latency and context without buying quality.
Composite patterns (Fan-out + Producer-Reviewer; Pipeline + Fan-out) are the rule, not the exception, in production teams.
The execution-mode coupling (team vs sub-agent) is part of each pattern's spec, not a separate decision.

What the patterns are actually defending against

| Pattern | Failure mode defended | Team-mode verdict | |---------|----------------------|-------------------| | Pipeline | Mid-pipeline drift (B depends on A's output; if A is incomplete, B produces nothing useful) | Limited team-mode benefit | | Fan-out/Fan-in | Siloed discovery (researchers don't share findings until late) | Must be agent team | | Expert Pool | Over-eager routing (router calls the wrong specialist) | Sub-agents sufficient | | Producer-Reviewer | Acceptance of unverified output (no quality gate) | Team useful | | Supervisor | Underutilized workers / starved workers | Team useful (shared task list) | | Hierarchical Delegation | Premature decomposition (sub-tasks broken before enough is known) | Team mode limited (nested teams forbidden) |

The table is not in the source as a single block — it is synthesized from the per-pattern prose in agent-design-patterns.md. But the substance is in the source. Each pattern section opens with "적합한 경우" (when to use) and "주의" (caution), and the cautions are the failure modes. Pipeline's caution is *bottleneck propagation*: each stage can stall the whole line. Fan-out/Fan-in's caution is *integration quality*: the consolidation step determines the whole result. Expert Pool's caution is *router accuracy*: misclassification is the failure mode. Producer-Reviewer's caution is *infinite loop*: the retry cap is the defense. Supervisor's caution is *supervisor bottleneck*: the central agent becomes the system's choke point. Hierarchical Delegation's caution is *context loss at depth*: three or more levels lose too much.

The patterns, in other words, are defined against their own failure surface. A correct pattern choice is one whose defense matches the dominant failure risk of the task. An incorrect choice is one that defends against a failure you do not have, while leaving the one you do have undefended.

A worked example — code review

You have a code-review task that needs security, performance, and test-coverage perspectives. The obvious choice is to spin up three reviewers. The question is *how*.

The naïve reading says "Fan-out/Fan-in: three reviewers in parallel, then one integrator." That looks right. It is also wrong in a specific way — security issues and performance issues are correlated. An SQL injection vulnerability is also a performance vulnerability (a query that fails validation is one that runs unbounded). If security-reviewer and performance-reviewer are isolated, neither will see the cross-domain pattern.

The Harness-spec-correct answer is Fan-out/Fan-in plus a Producer-Reviewer composite. Three reviewers fan out. Each reviewer's output is itself reviewed by a peer. The integrator collects both passes. The spec calls this Fan-out + Producer-Reviewer and lists it as one of the standard composite patterns in team-examples.md §복합 패턴.

The point is not that you must use composite patterns. The point is that *the pattern choice is a diagnosis*, and the diagnosis is wrong as soon as you treat the patterns as shapes.

flowchart TD
    Q["Code Review Task"]
    Q --> SR["Security Reviewer"]
    Q --> PR["Performance Reviewer"]
    Q --> TR["Test Coverage Reviewer"]
    SR <-->|SendMessage| PR
    PR <-->

9m / Article + audio

Premium chapters

4. execution_modes

Available after upgrade / 9m

5. limits_and_evolution

Available after upgrade / 10m