commaai/openpilot / Chapter 4

Programming /

E03_The_0_10_0_Pivot_When_the_Model_Eats_the_Controller

# The 0.10.0 Pivot: When the Model Eats the Controller > Version 0.10.0 replaced longitudinal Model Predictive Control with an end-to-end plan produced by a "World Model", and quietly changed the lateral training objective too. This is the most consequential architectural change in openpilot's history. It is also more conservative than it sounds. ## Key Takeaways - `RELEASES.md` for version 0.10.0 (2025-08-05) reads like a quiet revolution: "Longitudinal MPC replaced by E2E planning from World Model in Experimental Mode" and "Action from lateral MPC as training objective replaced by E2E planning from World Model". - The "World Model" is described in comma's CVPR paper "Learning to Drive from a World Model". The 0.10.1 release doubled its parameters and trained it on 4× the segments. - Model Predictive Control is a transparent optimization: a model of the car's dynamics, a cost function, a horizon. You can read the cost. You can audit the constraints. E2E is a black box that you trust because the simulator it was trained in is itself a learned model of human driving. - The pivot is conservative in a specific way: the safety kernels from E02 still gate everything the model is allowed to do. The E2E plan is *requested*; the watchdog, the state machine, the driver monitor, and the C-level `panda` validation are *enforced*. The architecture has not changed shape. The thing inside the modeld process has. - The bet is on data, not on rules. That is the new thing. The 332 cars × millions of minutes of driving is the asset the World Model is being trained on, and it is an asset the closed-fleet competitors cannot replicate. You tap Experimental mode on the device. The next time you engage openpilot, the longitudinal planner is no longer an MPC solving a horizon of constraints. It is a feed-forward pass through a feed-forward network that has been trained on 4× the segments it had last release. This is the 0.10.0 change, in one sentence. The rest of the chapter is the explanation of why you should not be alarmed, and also why you should not be complacent. I am going to lead with the contrarian framing first, because the contrarian framing is the one I held going into this analysis and the one I suspect most readers will hold going in. **An end-to-end neural network driving a 4,000-pound car sounds reckless.** "Model Predictive Control" is a phrase that contains the words *model* and *predictive* and *control*, and it sounds like the kind of thing you want between you and the next concrete barrier. "End-to-end planning from a World Model" sounds like the kind of thing you read about in a paper and then, six months later, read about in a recall notice. The instinct to be skeptical is not wrong. It is the right starting point. It is also, in the specific case of openpilot's 0.10.0 pivot, an instinct that the codebase is engineered to defuse. ## The two control paradigms, side by side I want to put the two paradigms in front of you, because the difference is real and the choice is real and neither is free. Here is the trade-off, drawn in code, not in philosophy. ```mermaid flowchart TB subgraph MPC[Classical MPC control] M1[State estimate<br/>pose, velocity, lead] --> M2[Cost function<br/>J = &#931; ref tracking + smoothness + accel limits] M2 --> M3[Optimizer<br/>solve horizon of constraints] M3 --> M4[First action<br/>accel, jerk] M4 --> M5[Car] M5 -->|new state| M1 end subgraph E2E[End-to-end World Model control] E1[Camera + pose + desired curvature] --> E2[World Model<br/>2x params, 4x segments<br/>feed-forward net] E2 --> E3[Planned trajectory<br/>6 seconds, 16 waypoints] E3 --> E4[First action<br/>accel, curvature] E4 --> E5[Car] E5 -->|new state| E1 end ``` Both loops run at the same 20 Hz. Both produce the same kind of output — a desired acceleration and a desired curvature for the next 0.05 seconds. Both are subject to the same downstream safety checks. The difference is the *path* from state to action. The MPC path is a chain of explicit, inspectable computations. The state estimate is a vector with known meaning. The cost function is human-readable Python that you can print. The optimizer is a numerical solver that you can step through. If the planner does something you do not like, you can read the cost function and ask which term penalised what. This is the *transparency* property of classical control. It is what makes MPC auditable, and it is what makes it appealing to safety engineers. The E2E path is a single neural-network forward pass. The network has 2× the parameters of the previous model (per the 0.10.1 release notes) and was trained on 4× the segments. The output is a planned trajectory — 16 waypoints over 6 seconds, in the language of comma's CVPR paper. If the planner does something you do not like, you cannot read the cost function. You can only compare the output to the outputs the model produced on held-out driving data. This is the *opacity* property of learned control. It is what makes E2E uninspectable, and it is what makes regulators nervous. You can see the trade-off. Transparency buys you auditability at the cost of representational power. Opacity buys you representational power at the cost of auditability. The history of ML in safety-critical systems is the history of the boundary between these two properties moving, slowly, in the direction of opacity, under pressure from data. openpilot 0.10.0 is one of the moments where the boundary moved. ## What the World Model actually is Here is the part I find most interesting and most underdiscussed. The "World Model" in the release notes is not a generic self-driving model. It is a specific architecture, described in a specific paper, that comma published at CVPR. The paper is called "Learning to Drive from a World Model". The architecture is: a learned simulator of human driving, trained on a corpus of recorded routes, that can be queried for the most likely future trajectory given a current observation. The driving policy is then trained inside the simulator, not on the real road. Why does this matter? Because the model is being trained on a closed-loop simulator that has itself been trained on mil

Chapter 4 of 5 12m Article Learning path

The 0.10.0 Pivot: When the Model Eats the Controller

Version 0.10.0 replaced longitudinal Model Predictive Control with an end-to-end plan produced by a "World Model", and quietly changed the lateral training objective too. This is the most consequential architectural change in openpilot's history. It is also more conservative than it sounds.

Key Takeaways

  • RELEASES.md for version 0.10.0 (2025-08-05) reads like a quiet revolution: "Longitudinal MPC replaced by E2E planning from World Model in Experimental Mode" and "Action from lateral MPC as training objective replaced by E2E planning from World Model".
  • The "World Model" is described in comma's CVPR paper "Learning to Drive from a World Model". The 0.10.1 release doubled its parameters and trained it on 4× the segments.
  • Model Predictive Control is a transparent optimization: a model of the car's dynamics, a cost function, a horizon. You can read the cost. You can audit the constraints. E2E is a black box that you trust because the simulator it was trained in is itself a learned model of human driving.
  • The pivot is conservative in a specific way: the safety kernels from E02 still gate everything the model is allowed to do. The E2E plan is *requested*; the watchdog, the state machine, the driver monitor, and the C-level panda validation are *enforced*. The architecture has not changed shape. The thing inside the modeld process has.
  • The bet is on data, not on rules. That is the new thing. The 332 cars × millions of minutes of driving is the asset the World Model is being trained on, and it is an asset the closed-fleet competitors cannot replicate.

You tap Experimental mode on the device. The next time you engage openpilot, the longitudinal planner is no longer an MPC solving a horizon of constraints. It is a feed-forward pass through a feed-forward network that has been trained on 4× the segments it had last release. This is the 0.10.0 change, in one sentence. The rest of the chapter is the explanation of why you should not be alarmed, and also why you should not be complacent.

I am going to lead with the contrarian framing first, because the contrarian framing is the one I held going into this analysis and the one I suspect most readers will hold going in. An end-to-end neural network driving a 4,000-pound car sounds reckless. "Model Predictive Control" is a phrase that contains the words *model* and *predictive* and *control*, and it sounds like the kind of thing you want between you and the next concrete barrier. "End-to-end planning from a World Model" sounds like the kind of thing you read about in a paper and then, six months later, read about in a recall notice. The instinct to be skeptical is not wrong. It is the right starting point. It is also, in the specific case of openpilot's 0.10.0 pivot, an instinct that the codebase is engineered to defuse.

The two control paradigms, side by side

I want to put the two paradigms in front of you, because the difference is real and the choice is real and neither is free. Here is the trade-off, drawn in code, not in philosophy.

flowchart TB
    subgraph MPC[Classical MPC control]
        M1[State estimate<br/>pose, velocity, lead] --> M2[Cost function<br/>J = &#931; ref tracking + smoothness + accel limits]
        M2 --> M3[Optimizer<br/>solve horizon of constraints]
        M3 --> M4[First action<br/>accel, jerk]
        M4 --> M5[Car]
        M5 -->|new state| M1
    end
    subgraph E2E[End-to-end World Model control]
        E1[Camera + pose + desired curvature] --> E2[World Model<br/>2x params, 4x segments<br/>feed-forward net]
        E2 --> E3[Planned trajectory<br/>6 seconds, 16 waypoints]
        E3 --> E4[First action<br/>accel, curvature]
        E4 --> E5[Car]
        E5 -->|new state| E1
    end

Both loops run at the same 20 Hz. Both produce the same kind of output — a desired acceleration and a desired curvature for the next 0.05 seconds. Both are subject to the same downstream safety checks. The difference is the *path* from state to action.

The MPC path is a chain of explicit, inspectable computations. The state estimate is a vector with known meaning. The cost function is human-readable Python that you can print. The optimizer is a numerical solver that you can step through. If the planner does something you do not like, you can read the cost function and ask which term penalised what. This is the *transparency* property of classical control. It is what makes MPC auditable, and it is what makes it appealing to safety engineers.

The E2E path is a single neural-network forward pass. The network has 2× the parameters of the previous model (per the 0.10.1 release notes) and was trained on 4× the segments. The output is a planned trajectory — 16 waypoints over 6 seconds, in the language of comma's CVPR paper. If the planner does something you do not like, you cannot read the cost function. You can only compare the output to the outputs the model produced on held-out driving data. This is the *opacity* property of learned control. It is what makes E2E uninspectable, and it is what makes regulators nervous.

You can see the trade-off. Transparency buys you auditability at the cost of representational power. Opacity buys you representational power at the cost of auditability. The history of ML in safety-critical systems is the history of the boundary between these two properties moving, slowly, in the direction of opacity, under pressure from data. openpilot 0.10.0 is one of the moments where the boundary moved.

What the World Model actually is

Here is the part I find most interesting and most underdiscussed. The "World Model" in the release notes is not a generic self-driving model. It is a specific architecture, described in a specific paper, that comma published at CVPR. The paper is called "Learning to Drive from a World Model". The architecture is: a learned simulator of human driving, trained on a corpus of recorded routes, that can be queried for the most likely future trajectory given a current observation. The driving policy is then trained inside the simulator, not on the real road.

Why does this matter? Because the model is being trained on a closed-loop simulator that has itself been trained on millions of minutes of human driving. The simulator is a *generative model* of how a road unfolds. The planner is being trained to do well in the simulator. The simulator is not perfect — it is, after all, a learned model — but it is more sample-efficient than training on the real road, because in the simulator you can roll out 6 seconds of hypothetical trajectory in the time it takes the real world to roll out 0.05 seconds.

The 0.10.1 release notes give you the scale: "World Model: 2× the number of parameters", "World Model: trained on 4× the number of segments". The 0.10.3 release notes add a "new temporal policy architecture" and a "new on-policy training physics noise model". The 0.11.0 release notes add that the 2026-03-17 model was "fully trained using a learned simulator" and had "improved longitudinal performance in Experimental mode". The trajectory is consistent: each release makes the simulator a bigger and more central part of the system. The released driving model is the *output* of training in the simulator. The simulator is the new *compiler*. The output is the binary.

I want to be specific about what is and is not in the World Model. It is not a giant end-to-end network that turns camera pixels directly into steering commands. There is still a state estimate, a planner, a longitudinal controller, a lateral controller. The World Model replaces the *plan* that the longitudinal and lateral controllers are trying to track. The classical-control stack has not been deleted. Its input has changed. That is the conservative part of the pivot.

Why this is more conservative than it sounds

If you read the release notes and stop at "E2E planning", the natural reaction is that openpilot has replaced a transparent controller with a black box. That is not what happened. Here is what actually happened.

The state machine from E02 still decides whether openpilot is engaged. The driver monitor from E02 still decides whether the driver is paying attention. The excessive-actuation watchdog from E02 still measures whether the actual car motion is inside twice the safety envelope. The panda C firmware from E01 still validates every actuator command before it leaves the device. None of these layers were touched by the 0.10.0 release. The safety architecture is exactly what it was in 0.9.x. What changed is *what the planner is allowed to request*.

In the old design, the planner was an MPC. The MPC solved a constrained optimization at 20 Hz, in real time, on the device, against a model of the car's dynamics that the openpilot maintainers hand-tuned per platform. If the MPC misbehaved, the watchdog caught it. If the watchdog misbehaved, the C firmware caught it. Three layers of defense around a transparent planner.

In the new design, the planner is an E2E neural network. The network runs at 20 Hz, on the device, against a learned model of how a road unfolds. If the network misbehaved, the watchdog catches it. If the watchdog misbehaved, the C firmware catches it. Three layers of defense around an opaque planner. The layers did not move. The thing they surround changed.

That is the conservative part. The new thing is not "we removed the safety code". The new thing is "we replaced the input the safety code was checking, with something that is more accurate on average, and the safety code still checks it on the worst case". The 2× acceleration threshold from helpers.py still applies. The 0.9-second lateral limit from docs/SAFETY.md still applies. The watchdog does not know or care whether the planner is an MPC or a neural network. The watchdog measures the *car*.

The 0.10.0 release notes include a third line that closes the argument for me. "Action from lateral MPC as training objective replaced by E2E planning from World Model." Read it carefully. The training *objective* for the lateral controller was changed. Not the inference path. The training path. The lateral controller is still a controller. It is being trained against a different target — the E2E plan, not the lateral MPC's own output. The system is being trained end-to-end but is still being deployed as a hybrid. That is the engineering version of "have your cake and eat it too", and it is the reason I changed my mind about the pivot.

Why this is also a real bet

I have been arguing that the pivot is more conservative than it sounds. I want to be honest about the parts that are not conservative, because they are the parts that matter.

The bet is on data. The 332 cars × millions of minutes of engaged driving is the corpus the World Model is being trained on. It is also the corpus the next World Model will be trained on, and the one after that. The release cadence (release-mici/release-tizi/nightly/nightly-dev) ensures that the corpus is refreshed continuously. Every user who runs the nightly branch and reports issues, every user who opts into driver-camera uploads, every user who lets the device upload on Wi-Fi, is feeding the corpus. The flywheel is the asset. The model is what comes out of the flywheel.

This is fundamentally different from the closed-fleet approach. Tesla FSD is trained on Tesla's fleet. GM Super Cruise is trained on GM's fleet. comma is training on a *user-funded* fleet, with permissive open-source licensing, with a model architecture that is published and a simulator that is published and a release process that is public. The data flywheel is the same in shape. The asset is not the data — that part is not novel — but the *scale* is novel for an open-source project, and the *diversity* of the fleet (332 cars from 30+ brands) is novel for a self-driving project of any kind.

The bet has a failure mode, and I want to name it. The failure mode is: the World Model is trained on a corpus that includes the user's own car, in the user's own neighborhood, in the user's own weather. The 332-car diversity helps. The millions of minutes helps. But the simulator is a learned model, and learned models can have failure modes that are not uniform across the input distribution. The watchdog is the last line of defense. The watchdog is a sample-level safety net, not a distributional one. It catches bad outputs. It does not catch a model that has subtly different failure modes on a new geography or a new weather pattern.

This is the part of the bet where the openpilot maintainers are betting that the safety layers from E02 are enough, in the worst case, to keep a bad model from killing anyone, while the model gets better with each release. The bet is reasonable. It is not free.

Why the 332 figure is the asset

I want to close this chapter by tying the technical pivot back to the framing figure from E00. Three hundred and thirty-two cars is not a marketing number. It is a *data-asset number*. Each car is a sensor platform that runs openpilot 24/7 (or as long as the user is willing to keep it plugged in), and each minute of engaged driving produces a routable, reproducible artifact (the topic of E04). 332 cars × 1 hour/day of average driving × 365 days = roughly 120,000 hours per year. The actual number is higher, because the openpilot community skews toward the kind of driver who does long road trips with the device plugged in.

120,000 hours of driving is a corpus you can train a serious model on. It is also a corpus that no closed-fleet competitor can assemble, because the closed-fleet competitors are limited to their own makes and their own geographies. comma gets 30+ brands and 50+ geographies, from a user base that is, by self-selection, technically sophisticated enough to install a comma four and live with alpha-quality software. The data flywheel is the most diverse in the consumer-AV industry, and it is the asset the World Model is being trained on. The 0.10.0 pivot is the moment comma stopped using the data to train a better controller and started using the data to train a better simulator. That is a one-way door.

I came into this chapter believing E2E is reckless versus MPC. I finished it believing E2E trained on a learned simulator is more sample-efficient than MPC trained on a hand-coded dynamics model, and the safety kernels still gate everything the model is allowed to do. The bet is on data, not on rules. That is the new thing. It is the most consequential change in the project's history, and it is the part of openpilot that, in my view, the rest of the consumer-AV industry has not yet understood.

I keep coming back to the same question. Where does the data go after the minute is over? That is the chapter we have not written yet, and it is the chapter that makes the bet visible.

---

References: