Anatomy_of_a_Seed_Job | Pagination AI Job 45

Anatomy of a Seed Job

The empty payload is not "no work" — it is the only job class where every byte of the output is contract and nothing is content.

Key Takeaways

A seed job's envelope is a packing slip, not a prompt. Every field is a constraint.
The four required chapter files are the *only* honest output the worker produces. The prose is decoration.
Empty sources is a designed signal. It forces the worker to invent a topic from the userPrompt and the contract.
The safety_input.json cross-reference shows that even test jobs are inspected. The safety gate is a precondition, not a side-effect.
The id prefix (pg_whmax_ai_job_45) is evidence of job-class taxonomy. A seed job belongs to a family, not a one-off.

Here is the line I used to type into my terminal, and the line I now refuse to type: "the payload is empty, so the worker has nothing to do." That sentence was wrong, and the cost of being wrong was high. It led me, for a long time, to under-invest in the very part of the pipeline that determines whether a real job ever ships.

Imagine you ship a worker and the only thing you can change is the output path. The model is fixed, the prompt template is fixed, the safety gate is fixed, the orchestrator is fixed. The only thing you control is which directory the worker writes to, and what it names the files. In that world, a seed job becomes the most informative test you can run — because the variable the test is exercising *is the variable you control*. If the four files appear in the right place with the right names, the contract holds. If they do not, no amount of prompt engineering will save you.

This chapter is about reading a seed job's envelope as a packing slip. Not as a prompt, not as a research brief, not as instructions. A packing slip is the piece of paper inside a box that tells you what should be in the box. If the contents do not match the slip, the box was packed wrong. The seed job is the same idea, but inverted: the contents are supposed to be the proof that the packer followed the slip.

The Envelope, Field by Field

Let me walk through a real seed-job envelope — the one this very article series was generated from. The fields below are exactly what arrived in the worker's hand.

{
  "id": "pg_whmax_ai_job_45",
  "created_at": "2026-06-18T01:16:47.906Z",
  "updated_at": "2026-06-18T01:16:47.906Z",
  "sources": [],
  "userPrompt": "Seed job for AI Create pagination verification.",
  "emphasis": [],
  "de_emphasis": [],
  "status": "processing",
  "progress": 0,
  "taskStatusMsg": "",
  "taskStatusErrorMsg": ""
}

There are twelve fields. Three of them are noise (created_at, updated_at, and the empty taskStatusMsg/taskStatusErrorMsg pair, which become useful only in failure). The remaining nine are the contract. Let me go through them.

id. The prefix pg_whmax_ is a job-class tag. It tells the worker — and anyone reading the logs later — that this is a parameterized job from a known family. The 45 is a sequence number. The worker does not need to act on the sequence number, but its presence tells you that this seed job is one of many, and that the family is being run repeatedly. That is the first signal that the seed job is an audit, not a one-off curiosity.

sources. The empty array. The most important field in the envelope. It is the only field whose emptiness is meaningful. A real job has at least one source — a URL, a repository, an attachment. A seed job has none. The empty array is a promise: *the worker must produce output without any external material to lean on*. This is the design.

userPrompt. The single line "Seed job for AI Create pagination verification." This is the only natural-language field in the envelope. It identifies the job class (seed job), the target system (AI Create), and the verification target (pagination). It is also, as the article you are reading demonstrates, the topic anchor. When the worker has no sources, it has to invent a topic. The userPrompt is the only material it can use to do so. If the resulting article series is about anything coherent, it is because the worker extracted a topic from this one line and a half-dozen contract rules.

emphasis and de_emphasis. Both empty. A real job carries the user's leanings — "go deep on X," "skip Y." A seed job carries no leanings. The output is unbiased by design. This is one of the seed job's most underappreciated features: it tests the *unbiased* behavior of the worker, the behavior that emerges when the author has no opinion.

status and progress. The seed job ships with status: "processing" and progress: 0. The worker is expected to transition status through "running" to "completed" (or to "error" with a populated taskStatusErrorMsg) and to advance progress from 0 to 100. This is the state machine. It is part of the contract, and it is exercised in full by a seed job — the worker has to honor the transitions even when there is nothing real to make progress on.

taskStatusMsg and taskStatusErrorMsg. Empty in the input. The worker is expected to leave them empty on success. On failure, taskStatusErrorMsg becomes the diagnostic. The empty-input / empty-success / populated-failure pattern is itself a contract rule.

flowchart TB
  subgraph envelope["Job envelope (task.json)"]
    A1["id: pg_whmax_ai_job_45"]
    A2["userPrompt: 'Seed job for AI Create pagination verification.'"]
    A3["sources: [] (designed empty)"]
    A4["emphasis/de_emphasis: []"]
    A5["status: processing → running → completed"]
  end
  subgraph safety["Safety gate (safety_input.json)"]
    B1["description scanned"]
    B2["emphasis/de_emphasis scanned"]
    B3["precondition: must pass before worker writes"]
  end
  subgraph worker["Worker process"]
    C1["derive topic from userPrompt"]
    C2["plan 4 chapters"]
    C3["write 00_ … 03_ files"]
    C4["update task.json status"]
  end
  envelope --> safety --> worker
  worker --> D1["article/00_*.md"]
  worker --> D2["article/01_*.md"]
  worker --> D3["article/02_*.md"]
  worker --> D4["article/03_*.md"]
  style A3 fill:#fff3cd,stroke:#333
  style B3 fill:#e8f4e8,stroke:#333
  style D1 fill:#e8f4e8,stroke:#333
  style D4 fill:#e8f4e8,stroke:#333

The diagram above is the entire seed-job pipeline, in eight boxes. The yellow box is the designed-empty sources field. The green boxes are the success conditions. The pipeline is small enough to fit on a screen and complex enough to fail in at least six different ways.

The Four Files, Considered as Proof

A seed job's chapter files are not articles in the ordinary sense. They are receipts. The 00_ through 03_ prefix is the receipt numbering. The titles are arbitrary (any non-empty string is acceptable, as long as the slug is filesystem-safe). The body is arbitrary. The thing that is *not* arbitrary is that exactly four files exist, in the article/ directory, with the right names, and that task.json has been transitioned to status: "completed" with progress: 100.

Here is what the orchestrator does, in order, when it polls for completion:

1. It checks task.json for status: "completed" and progress: 100. If either is wrong, the job is treated as failed. 2. It lists the article/ directory and counts files matching the EXX_*.md pattern. If the count is not four, the job is treated as failed. 3. It sorts the files lexicographically and concatenates them. The zero-padding on the indices is what makes the lexicographic sort match the intended order. 4. It reads taskStatusErrorMsg. If non-empty, the job is treated as failed.

Notice that *none of these steps read the body of any chapter file*. The orchestrator does not care whether the prose is good, whether the Mermaid diagrams render, or whether the citations are accurate. It cares that there are four files, named correctly, in the right directory, with a closed-out status. The contract is about the wrapper, not the wrapped.

This is the part of the contract most engineers under-appreciate. I used to think the chapter body mattered to the orchestrator. It does not. The body is for the human reader. The wrapper is for the orchestrator. A seed job tests the wrapper, because that is the part that, if it fails, makes the entire job look broken even when the body is excellent.

What the Empty Sources Force

I keep returning to the empty sources field, because it is where the seed job's design is most clearly visible. With sources, the worker can fall back on the prompt template's "if sources exist, summarize them" branch. With no sources, that branch is unreachable. The worker has to take the "if no sources, derive a topic from the userPrompt" branch. That branch is rarely tested. The seed job tests it on every run.

A common failure mode I have seen, in workers that handle seed jobs poorly, is that they hit the "no sources" branch, panic, and either:

Emit a single generic chapter ("Introduction") and stop, leaving progress at 25.
Loop the same chapter four times under different filenames.
Write the four files but never update status, leaving the orchestrator to time out.

Each of these failures is a contract violation, and each one is *only visible in a seed job*. A real job, with rich sources, never triggers the "no sources" branch, so the worker can ship for months with a broken fallback before the first empty-payload job exposes it. The seed job is the only thing that exercises the branch in production. That alone justifies running them.

The user is, of course, a test runner. The seed job is the canary in the coal mine. The four files are the canary's heartbeat.

Why Four, Specifically

The next chapter will dig into the contract pieces in detail, but I want to close this one with the question of *why* the contract specifies exactly four chapter files. The number is not arbitrary. It is chosen to be the smallest number that makes every common failure mode loud.

One chapter is not enough — a worker that writes a single chapter and stops will appear to have "succeeded" if the orchestrator only checks for non-empty output. Two chapters is borderline. Three is where the off-by-one error (writing 00_, 01_, 03_ and skipping 02_) becomes visible. Four is where *every* common failure mode is loud: missing chapter, duplicate chapter, wrong index, wrong directory, status not updated. Five would test the same things, at higher cost. Four is the minimum number that maximizes signal per chapter written.

The contract is the product. The four files are the proof. The empty sources are the design. The orchestrator is the auditor. The worker is the one being audited.

The filenames are only half the story. The other half is what the orchestrator does with them — and that is the subject of the next chapter.