Start here
When an Agent Earns the Broker
The hard part of giving an LLM a trading account is not the LLM — it is the trust boundary. And trust is built in layers, the way you build a flight control system.
Key Takeaways
- Connecting an LLM agent to a broker is an authority-delegation problem, not a tool-use problem.
- Vibe-Trading answers it with five interlocking layers: data, model, broker, research workflow, and audit — each assuming the layer outside it can fail.
- The series below traces those five layers in order of how foundational they are, from a single chokepoint in the data layer up to a hypothesis-shaped research workflow that closes the loop.
- The pattern is portable. If you swap "broker" for "production database," "compliance team," or "customer-facing action," the same five layers apply.
---
A research analyst at a quant desk runs a fifty-worker investment committee on NVDA in June 2026. The agent bull-cases the AI capex story, the bear counters valuation, the risk reviewer pulls recent volatility, and the portfolio manager drafts a position. Mid-run, the analyst notices the equity researcher cited a price that's nine trading days stale. The volatility worker cited a volume number from a malformed bar where volume was present but OHLC was empty. The risk reviewer pulled a price from a fallback source that returned NaN on close, which leaked into a non-strict JSON field and arrived at the PM as the literal string NaN — an answer the LLM happily quoted.
The root cause was not a bad model. It was not a slow broker. It was not even a missing tool. The swarm workers had each written their own ad-hoc yfinance snippet, each trusted a different malformed bar, each leaked NaN a different way. The agent loop did not centralize market data, so the workers re-centralized it badly, in fifty different shapes.
That incident — filed as HKUDS issue #198, fixed in pull request #199 on 2026-06-11 — is where this series starts. Because the same shape of failure shows up everywhere you connect an LLM to something consequential: a broker, a production database, a customer-facing action. The bug is not the model. The bug is the gap between "tool use" and "delegated authority."
Consider what "delegated authority" actually means. When you wire an LLM into a brokerage API, you are not registering a tool. You are appointing an agent to act on your behalf in a system where errors have monetary cost. The agent can place orders. It can read your positions. It can move money between accounts, through whatever the broker permits. Every broker SDK that exposes place_order is, in the language of agency law, extending authority. Most agent frameworks hide that fact behind a function signature. Vibe-Trading does not hide it. It builds the entire architecture around the assumption that the signature is the smallest part of the problem.
This series is about closing that gap. The artifact under examination is Vibe-Trading, an MIT-licensed research workspace built by HKU Data Science at the University of Hong Kong, currently shipping version 0.1.10. It is, by a comfortable margin, the most carefully engineered bounded-autonomy agent I have read in 2026. It is also the one that has taught me the most about how to think about an LLM touching something that matters.
The series-level claim, delivered up front so you can decide whether to read further: trust is the product. Not the model. Not the data. Not the UI. Trust — built as five interlocking layers, each assuming the layer outside it can fail, each structurally independent of the others. The five layers, in the order they appear when an LLM touches your financial life, are
7m / Article + audio + video
The Loader Registry: How 18 Data Sources Stopped Fighting Each Other
When one tool call can return eighteen different answers for the same symbol, the only safe move is a single chokepoint — and centralizing it eliminates an entire class of bugs.
Key Takeaways
- The bug moves downward. A swarm worker's hallucinated price was a symptom; the cause was fifty uncentralized data fetches; the fix was the loader registry at the bottom.
- A loader registry is not a list of sources. It is a contract: OHLC sanity at the boundary, fallback chains by IP-ban risk, cache with staleness guard, PIT-safe fundamental enrichment.
- Eighteen sources exist not because the author is indecisive but because each market has different failure modes — China A-share has IP-ban risk; US has data fragmentation; crypto has venue fragmentation. One chokepoint absorbs all of that.
- Centralizing market data was the precondition for everything else in this series. Without it, the provider capability layer (next chapter) would have inherited the same fifty-shapes-of-truth problem.
---
Here is a function call that returned the wrong answer without throwing an exception. In the Vibe-Trading source tree under agent/src/market_data.py, the Tushare loader exposes a method called daily() that, for an A-share ETF like 510300.SH, returns an empty DataFrame. No error. No warning. Just an empty frame that the calling code interprets as "no trading happened on those days." The trading day did happen. The loader is wrong.
This is the canonical shape of every data-layer bug I want to discuss in this chapter. The bug is invisible from the call site. It looks like a market that did nothing, when in fact the data source simply did not know how to ask for that instrument. The fix landed on 2026-06-26 in pull request #315, and it is worth reading for its parsimony: ETFs route to fund_daily(), indices to index_daily(), HK equities to hk_daily(). The bug existed for months because nobody had built the routing layer that the call site assumed was there.
The lesson is not "fix the tushare loader." The lesson is: a market-data call site should not be choosing between sources, choosing between endpoints, or choosing between fallback chains. It should be calling one chokepoint. When that chokepoint does not exist, every consumer of market data reimplements it badly, in its own shape. The NVDA incident I opened the series with — fifty swarm workers writing fifty yfinance snippets — was the architectural symptom of the same disease.
Imagine you are operating a quant research platform. Your agent loop calls get_market_data("NVDA", days=30). Behind that single call, the loader registry runs an ordered decision tree: which market? which source is best for that market today? which fallback if that source fails? what is the cache? what does OHLC integrity look like at the boundary? Each of those questions is non-trivial. The right answer is to ask them once, in one place, and to make every consumer — backtests, swarm workers, the Web UI, the MCP server — go through the same gate.
Vibe-Trading does exactly this, and the consequences ripple outward. I want to walk through the design in three steps: the contract the loader enforces, the fallback chains it picks from, and the integrity checks that sit at the boundary. Then I want to show why centralizing this was the precondition for the swarm-grounding fix in PR #199 — because without the registry, you cannot give fifty workers a "use this one tool" instruction that actually means anything.
The contract
Every loader in the registry conforms to a fixed shape: (symbol, start, end, fields=None) -> DataFrame with normalized OHLCV columns, point-in-time-safe enrichment, and strict JSON serialization. Two non-obvious rules govern that shape.
First, the OHLC sanity check sits at the boundary, not in the consumer. Pull request #274, merged 2026-06-20, drops dirty bars (high < low, non-positive prices, bad bracketing) centrally at the loader exit. Before that PR, every consumer had its own definition of "dirty" — a backtest engine might tolerate high == low, a portfolio simulator might not — and the disagreement surfaced as bugs that looked like model failures. Centralizing the check made the contract enforceable: if a row reaches the agent, it has already been vetted.
Second, non-finite floats serialize as null, not as the literal string NaN. This sounds pedantic until you watch a JSON parser crash on a downstream strict-mode validator. Pull request #238 made run-card payloads strict-JSON-clean; pull request #306 made validation JSON sanitize nested NaN/Infinity. The pattern is consistent: every exit point from the loader must produce JSON that survives a strict parser, because the loader's consumers include the Web UI, which speaks strict JSON to the browser, and the swarm workers, which pipe results through tool-call argument previews.
The contract also covers point-in-time safety. When a backtest asks for "the income statement as of yesterday," the loader must not silently use a restated quarterly figure that was only published this morning. Pull request #302 (2026-06-24) made Shadow Account rule extraction see PIT-safe entry context — entry_rsi14 and prior_5d_return fetched through the loader registry as of buy_dt. Pull request #76 (2026-05-08) added fundamental_fields with the same discipline. The contract is the same: the loader knows what "as of" means, and the consumers do not have to argue about it.
flowchart LR
C["Consumer<br/>(backtest · swarm worker · web UI · MCP tool)"]
R["Loader registry<br/>(single chokepoint)"]
OHLC["OHLC sanity gate<br/>PR #274"]
PIT["PIT-safe enrichment<br/>PR #302, #76"]
STRICT["Strict JSON serialization<br/>PR #238, #306"]
CACHE["Opt-in local cache<br/>PR #177"]
SRC["18 sources<br/>tencent · mootdx
11m / Article + audio + video
The OpenAI-Compatible Lie
"OpenAI-compatible" is not a contract. It is a sketch. Treating the sketch as a contract produced four completely different production failures from one shared shim — and the fix was to delete the shim.
Key Takeaways
- "OpenAI-compatible" describes wire format, not behavior. Wire-format compatibility is the easy part. Behavioral compatibility is the part that breaks production.
- A single shim that handles per-provider quirks globally cross-contaminates them. The DeepSeek hang fix broke Kimi. The Kimi workaround broke Gemini. The Gemini fix broke DeepSeek.
- The replacement is an explicit capability layer: each provider's quirk lives in its own file, gated to its own provider, never applied to any other.
- The same lesson applies to brokers, to data sources, to anything with heterogeneous behavior behind a homogeneous API. "Default" is a stance, not a fact — make the stance explicit or it will leak.
---
On the morning of 2026-06-12, four independent production issues landed on the Vibe-Trading issue tracker within ninety minutes of each other. DeepSeek runs were stuck on "Agent is working…" indefinitely (#208). Kimi rejected the client outright (#204). The UI never recovered after a stall (#195). And reached max iterations was masking empty model responses (#203). Four bugs. Four surfaces. Four different reproduction paths. One root cause, diagnosed in the same day, fixed in the same pull request: every OpenAI-compatible provider ran through a single shim that applied DeepSeek/Kimi/Gemini quirks globally and silently swallowed stream failures.
I want to walk through that day in detail because it is the cleanest case study I know for "wire-format compatibility is not behavioral compatibility." The lesson generalizes: anywhere you have a heterogeneous set of systems behind a homogeneous API, a single shim is a single point of cross-contamination. The fix is always to delete the shim and let each system's behavior be its own.
The single shim, in Vibe-Trading, was the OpenAI client wrapper used by LangChain. Every provider — DeepSeek, Kimi, Moonshot, Gemini via OpenAI-compatible, OpenRouter, GLM/Zhipu, Qwen, plus the obvious OpenAI and Anthropic paths — went through it. The shim did four things that turned out to be incompatible. It applied DeepSeek-specific reasoning-content replay to all providers (which broke Kimi). It set a single User-Agent header (which Kimi rejected). It captured Gemini thought signatures only on the in-memory path (which left dict-replayed history without the signature, breaking multi-turn tool calls). And it swallowed stream failures silently, falling back to a slow non-streaming call (which masked reached max iterations errors when the model returned nothing).
Each individual behavior was correct for the provider it was designed for. Each was wrong for every other provider. The bugs were not bugs in any single behavior — they were bugs in the architecture of "one shim handles all."
The four quirks, named
Let me name the four quirks explicitly, because they are general patterns that show up in any system with a homogeneous API and heterogeneous behavior behind it.
Quirk 1: The reasoning-content format is per-provider. DeepSeek, Kimi, and Qwen all support reasoning-mode outputs, but they emit the reasoning payload in different fields, with different replay rules, and with different policies on whether assistant-prefill handoff messages are accepted. A shim that captures reasoning as one canonical format will get two of three providers wrong. The fix, in Vibe-Trading, is reasoning capture and replay gated per-provider — Kimi's path is verified end-to-end against the live API on kimi-k2.6, with tool calls and strict multi-turn reasoning replay, but the same code path does not run for DeepSeek.
Quirk 2: The User-Agent header is a handshake. Some providers fingerprint the client by User-Agent and reject unknown strings. Kimi does this; DeepSeek does not. A shim that sets one header for all providers will be accepted by one and rejected by the other. The fix is a per-provider User-Agent override — MOONSHOT_USER_AGENT is exposed as an env var specifically because the Kimi path needs a different default than the OpenAI path.
Quirk 3: Per-call signatures are per-provider. Gemini 2.5 and 3.x attach a thoughtSignature to each tool call, and the signature must round-trip on the next request or multi-turn tool calling fails with INVALID_ARGUMENT. The signature is preserved on the in-memory path but dropped when the agent loop replays history as OpenAI-format dicts through LangChain. A shim that handles the in-memory case will miss the dict-replay case. The fix lives in a single _convert_input chokepoint where both invoke and stream pass through; the signature is re-attached at that chokepoint, including for parallel calls where only the first of N is signed (pull requests #176 and #184, 2026-06-05 and 2026-06-08).
Quirk 4: Stream-failure handling is per-provider. Some providers return transient connection resets that should be retried; others return deterministic 4xx errors that should fail fast. A shim that retries all stream failures will hammer an already-rejected request. A shim that fails fast on all stream failures will give up on a recoverable one. The fix is a contextual provider_stream_error exception with one automatic retry for t
9m / Article + audio + video
Premium chapters
4. E03_Bounded_AutonomyAvailable after upgrade / 13m
5. E04_Hypothesis_Shaped_ResearchAvailable after upgrade / 14m