E00_When_an_Agent_Earns_the_Broker | HKUDS/Vibe-Trading

When an Agent Earns the Broker

The hard part of giving an LLM a trading account is not the LLM — it is the trust boundary. And trust is built in layers, the way you build a flight control system.

Key Takeaways

Connecting an LLM agent to a broker is an authority-delegation problem, not a tool-use problem.
Vibe-Trading answers it with five interlocking layers: data, model, broker, research workflow, and audit — each assuming the layer outside it can fail.
The series below traces those five layers in order of how foundational they are, from a single chokepoint in the data layer up to a hypothesis-shaped research workflow that closes the loop.
The pattern is portable. If you swap "broker" for "production database," "compliance team," or "customer-facing action," the same five layers apply.

---

A research analyst at a quant desk runs a fifty-worker investment committee on NVDA in June 2026. The agent bull-cases the AI capex story, the bear counters valuation, the risk reviewer pulls recent volatility, and the portfolio manager drafts a position. Mid-run, the analyst notices the equity researcher cited a price that's nine trading days stale. The volatility worker cited a volume number from a malformed bar where volume was present but OHLC was empty. The risk reviewer pulled a price from a fallback source that returned NaN on close, which leaked into a non-strict JSON field and arrived at the PM as the literal string NaN — an answer the LLM happily quoted.

The root cause was not a bad model. It was not a slow broker. It was not even a missing tool. The swarm workers had each written their own ad-hoc yfinance snippet, each trusted a different malformed bar, each leaked NaN a different way. The agent loop did not centralize market data, so the workers re-centralized it badly, in fifty different shapes.

That incident — filed as HKUDS issue #198, fixed in pull request #199 on 2026-06-11 — is where this series starts. Because the same shape of failure shows up everywhere you connect an LLM to something consequential: a broker, a production database, a customer-facing action. The bug is not the model. The bug is the gap between "tool use" and "delegated authority."

Consider what "delegated authority" actually means. When you wire an LLM into a brokerage API, you are not registering a tool. You are appointing an agent to act on your behalf in a system where errors have monetary cost. The agent can place orders. It can read your positions. It can move money between accounts, through whatever the broker permits. Every broker SDK that exposes place_order is, in the language of agency law, extending authority. Most agent frameworks hide that fact behind a function signature. Vibe-Trading does not hide it. It builds the entire architecture around the assumption that the signature is the smallest part of the problem.

This series is about closing that gap. The artifact under examination is Vibe-Trading, an MIT-licensed research workspace built by HKU Data Science at the University of Hong Kong, currently shipping version 0.1.10. It is, by a comfortable margin, the most carefully engineered bounded-autonomy agent I have read in 2026. It is also the one that has taught me the most about how to think about an LLM touching something that matters.

The series-level claim, delivered up front so you can decide whether to read further: trust is the product. Not the model. Not the data. Not the UI. Trust — built as five interlocking layers, each assuming the layer outside it can fail, each structurally independent of the others. The five layers, in the order they appear when an LLM touches your financial life, are:

1. The data layer — one chokepoint, every tool goes through it. 2. The model layer — no shared shim, capability-per-provider. 3. The broker layer — bounded autonomy, not tool use. 4. The research workflow — hypothesis-shaped, not transaction-shaped. 5. The audit trail — every step produces an inspectable artifact.

The chapters that follow each take one layer apart. They are not a tour of Vibe-Trading's feature list. They are an argument for a specific way to build any agent that has consequences.

flowchart TB
    L1["Layer 1 — Data<br/>loader registry · sanity check · cache · fallback chains"]
    L2["Layer 2 — Model<br/>capability-per-provider · stream isolation · empty-response surfacing"]
    L3["Layer 3 — Broker<br/>mandate · kill switch · fail-closed gate · audit ledger · structural per-broker guard"]
    L4["Layer 4 — Research workflow<br/>Goal → Hypothesis → Signal → Backtest → Attribution"]
    L5["Layer 5 — Audit trail<br/>run cards · config_hash · strategy_hash · session fsync · call_id correlation"]

    L1 --> L2 --> L3 --> L4 --> L5
    L5 -.closes the loop.-> L1

I want to flag the loop on the right of that diagram. It is not decorative. Layer 5 feeds back into Layer 1 because every audit artifact is, in the next run, a piece of training data — a remembered fact, a validated hypothesis, a backtest that the next agent will run with config_hash and strategy_hash already attached. The five layers are not a stack. They are a cycle, and the cycle is the product.

A word on scope before we begin. Vibe-Trading is large — 1,431 files across a Python agent core, a React 19 frontend, an MCP server, an interactive CLI, and ten broker connectors. I am not surveying that surface area. I am pulling four threads through it, each thread chosen because it generalizes. If you operate an LLM agent against any consequential system, the patterns below will apply — the names will change, the boundaries will move, the structure will not.

One more thing. The series is opinionated. I am going to make claims. Some of them will turn out to be wrong in five years — that is fine, that is how it should be. But the claims are mine, and I will trace each one to a specific source: a file, a pull request, a CHANGELOG entry, a code path. The "trust is the product" claim, for example, is not a metaphor I am importing. It is an inference from the way the codebase is organized: every privilege is a flag you must set, every dangerous path is default-deny, every result that leaves the agent carries provenance. The architecture speaks before the documentation does, and what it says is *we are afraid of this*.

That fear is the most professional thing about Vibe-Trading. Most agent frameworks treat broker integration as "expose the SDK as a tool." Vibe-Trading treats broker integration as the thing you build an entire bounded-autonomy pattern to defend. The framework's own self-description is worth quoting: "It holds no funds and never trades outside the limits you set, and you can halt it instantly." That sentence is the entire series in twenty-four words.

The default alternative, in most agent frameworks I have read, looks like this: a function called place_order(symbol, qty, side) is registered as a tool; the LLM is told, in its system prompt, "be careful with this"; and the framework ships. The first time a real user wires it to a live broker, three things happen, in some order: the prompt gets ignored on a hot take, the order goes through at the wrong size, and the engineer starts writing a kill switch. Vibe-Trading started from the position that the kill switch, the audit ledger, the structural paper/live guard, and the bounded mandate are not retrofits. They are load-bearing. Remove any one and the structure is unsafe; remove any two and it is unimplementable. The five elements were not added to make the system safe. They were specified to make the system possible.

We start in the data layer, because the data layer is the foundation. Every other layer assumes the layer below it produces clean, defensible, reproducible inputs. When that assumption fails — as it did in the NVDA incident — the failures cascade upward through the model, into the swarm, and out the broker side as visible misbehavior. The fix is never at the top of the stack. The fix is at the chokepoint.

One more thing before we get there. You will notice, reading the chapters that follow, that I keep reaching for the same sentence: *the bug moves downward, and you fix it at the lowest layer.* That is the through-line. The data layer's bugs live in loaders. The model layer's bugs live in a single shim. The broker layer's bugs live in the absence of a structural paper/live discriminator. The research workflow's bugs live in the absence of an artifact trail. In every case, the surface symptom is high in the stack — a hallucination, a missed trade, a wrong backtest — and the fix is at the bottom. The series is an exercise in following that gravity downward.

A note on how to read this series. The chapters are technical, but they are not tutorials. I am not going to walk you through the loader registry line by line, or quote the provider capability layer in full. The point is the *shape* of the design decisions and the *reasoning* behind them. If you want to verify a specific claim, every chapter closes with references that point to the file, the pull request, or the CHANGELOG entry. The architecture is the argument; the citations are the receipts.