commaai/openpilot

openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 300+ supported cars.

5 chapters 0 audio lessons Article-first 3 free previews Fresh topic

Start here

1. E00_Three_Hundred_and_Thirty_Two_Cars_One_Loop

Three Hundred and Thirty-Two Cars, One Loop

openpilot is not a car app bolted onto the dashboard. It is an operating system whose 100 Hz control loop happens to drive a steering wheel.

Three hundred and thirty-two. That is the number you will find in docs/CARS.md if you scroll past the autogenerated header. Three hundred and thirty-two supported cars. The README rounds down to "300+", the marketing copy rounds up to "more supported vehicles than any other open-source ADAS project in the world", and neither of those framings is wrong. But neither of them is the right one either. The right framing is that the 332 figure is a measure of how thinly comma has abstracted the boundary between its software and the wildly different ADAS APIs it speaks to.

That abstraction is the OS metaphor, taken seriously. Not a metaphor in the loose sense — comma actually calls the thing an "operating system for robotics" on the front page of its repository, and the codebase is engineered as if it meant it. A process orchestrator with a safety kernel. A pub/sub bus on which car-specific daemons can be hot-swapped. A package manager that delivers four parallel release trains (release-mici, release-tizi, nightly, nightly-dev) to two device classes (the comma 3X and the newer comma four). If you have ever used a Unix system whose kernel is the smallest possible trusted computing base, and whose userland is where all the interesting — and replaceable — work happens, you already have the mental model. openpilot's panda firmware is the kernel. The Python selfdrived daemon is init. The car-specific firmware ports are drivers.

I want to spend the next five chapters inside that mental model. Not because it is fashionable to call consumer products "operating systems" — that label gets slapped on everything from microwaves to toothbrushes these days — but because, in the case of openpilot, the architecture only becomes legible once you stop thinking of it as a single program and start thinking of it as a small distributed system that happens to live in a windshield-mounted compute box. Once you do, the seemingly strange design choices — why is there a separate process for everything? why are messages serialized through Cap'n Proto? why does the safety-critical code live in C while the rest lives in Python? — fall out as the obvious answers to the obvious questions.

Let me set the scene with the most concrete fact I have. Imagine you are at the comma.ai site, picking a harness for your 2018 Honda Civic. The dropdown knows your car. The dropdown knows the part numbers of the three connectors you need, the OBD-C cable length that fits between the dashboard and the comma four, and which firmware branch your car is currently happiest with. That dropdown is the visible top of the OS. Below it: a per-car fingerprinting module that detects your make, model, and trim from the CAN bus traffic, then negotiates the right process configuration, then engages a state machine that is — and I will get into this in E02 — formally too simple to be wrong. That is the bet comma has made: the cars are drivers, the user is just a privileged process, and the code that talks to the steering wheel is the smallest, most boring, most thoroughly tested layer in the stack.

This is the rare open-source robotics project where the marketing line happens to be the engineering line. Most open-source projects in this space either over-promise (full self-driving) or under-promise (just a visualization library). openpilot promises what it ships: it upgrades the driver assistance system on 332 cars. It does so via a specific, well-bounded, ISO 26262-obsessed design that, if you read the code, has the same texture as a Postgres or an nginx — unglamorous, well-factored, and quietly handling the kind of system that breaks careers when it fails. The 332 number is the proof that the abstraction holds.

Here is what I will show you, in order.

E01 opens up the process architecture. We will look at the

7m / Article + audio

2. E01_A_100Hz_Day_in_the_Life_of_selfdrived

A 100 Hz Day in the Life of `selfdrived`

The reason openpilot ships on 332 cars is not the model. It is the decision to treat every subsystem as an independently faulting process, communicating through Cap'n Proto messages on a fixed-frequency bus.

Key Takeaways

selfdrived is a 100 Hz pub/sub consumer, not a controller — it makes the engagement decision; the controller is downstream.
The process boundary is the fault boundary: a crashing modeld cannot stall the lane-keeping actuator because the two are separate processes on the same device.
Frequency is the discipline. CAN traffic runs at 100 Hz, model inference at 20 Hz, the IMU at 104 Hz, the camera frames at 20 Hz. Each subscriber decides what to do when a message is late.
This is why a Python codebase can ship in a safety-critical system: the safety code is not in Python. It is in the C firmware of the panda device, which is the only thing the car ever listens to.

Here is the single most important 14 lines in the entire openpilot repository. It lives in openpilot/selfdrive/selfdrived/selfdrived.py, between lines 86 and 93. I will show it to you, then I will show you why it is the entire architecture in miniature.

self.sm = messaging.SubMaster(['deviceState', 'pandaStates', 'peripheralState', 'modelV2', 'liveCalibration',
                               'carOutput', 'driverMonitoringState', 'longitudinalPlan', 'livePose', 'liveDelay',
                               'managerState', 'liveParameters', 'radarState', 'liveTorqueParameters',
                               'controlsState', 'carControl', 'driverAssistance', 'alertDebug', 'userBookmark', 'audioFeedback',
                               'lateralManeuverPlan'] + \
                               self.camera_packets + self.sensor_packets + self.gps_packets,
                              ignore_alive=ignore, ignore_avg_freq=ignore,
                              ignore_valid=ignore, frequency=int(1/DT_CTRL))

Read it again. That is a single SubMaster call. It registers a subscription to 20 named message services, three camera state services, three sensor services, and a GPS service. The call is parameterised by frequency=int(1/DT_CTRL), and DT_CTRL is the canonical control-loop period — defined in openpilot/common/realtime.py as 0.01 seconds. One hundred hertz. That single line means: "every 10 milliseconds, deliver the latest value of each of those 26 services to me, in one batch, or skip this tick if any of them is overdue." That is the entire process architecture of openpilot, in one line. Now let me show you the rest of it.

The bus and the kernel

Imagine you are reading telemetry from your own car at 100 Hz. The road-camera frame arrives at 20 Hz. The CAN bus at 100 Hz. The driving model at 20 Hz. The IMU at 104 Hz. They never block each other. If the model is slow on a tick, selfdrived skips the model on that tick and uses the last good plan. If a camera frame drops, the lane-keeping controller uses the last visible lane. If the GPS packets are absent for three seconds, selfdrived keeps driving on the localizer. This is the entire point of the architecture: the slowest producer in the system cannot stall the fastest consumer, because they are separate processes and the bus drops messages rather than blocks.

The bus is implemented in openpilot/cereal/. It is Cap'n Proto, the same serialization library that Cloudflare and Sandstorm use, chosen because it is small, zero-copy, and has no schema compilation step on the Python side. Each service is declared in services.py with a frequency, a decimation factor, and a queue size. can runs at 100 Hz with a decimation of 2053 (about three messages per logged segment, the rest are dropped to keep disk usage bounded). controlsState runs at 100 Hz with a decimation of 10 and a 2 MB queue. modelV2 runs at 20 Hz. accelerometer and gyroscope run at 104 Hz — the raw sensor rate. The schema is a single Cap'n Proto file (log.capnp) and the service list is a single Python file. The whole bus fits in your head.

The kernel is panda — a separate C codebase, not in the openpilot repository but symlinked at panda/ as a submodule. panda is the only process that talks to the car's CAN bus directly. It validates every command the openpilot Python stack sends, against a per-car fingerprint, against the MISRA-C safety rules comma inherited from the automotive supply chain, and against the hardware watchdog that physically cuts the actuator line if the firmware stops responding. openpilot's Python code can issue the command "steer left 3 degrees at 5 Nm". panda can refuse to issue the command. The car never sees the command if panda refuses. That is the safety kernel, and it is not in Python. It is in C, in firmware, on a separate microcontroll

8m / Article + audio

3. E02_Three_Layers_of_Saying_No

Three Layers of Saying No

openpilot's safety is not a single check. It is three independent defensive layers — state machine, excessive-actuation watchdog, driver monitoring — plus a fourth social-contract layer (the fork policy) that nobody writes tests for. Each layer can fail. The system survives.

Key Takeaways

docs/SAFETY.md specifies exactly two non-negotiable requirements: the driver can always retake control, and the vehicle cannot outpace the driver's reaction time. Every line of safety code traces back to one of those two lines.
The state machine in state.py is small enough to be obviously correct — 98 lines, five states, six event types, three-second soft-disable timer. The art is the *priority* of the events, not the number of them.
ExcessiveActuationCheck in helpers.py enforces the second safety requirement at runtime with a 2× threshold: twice the maximum allowed longitudinal acceleration, twice the ISO lateral-acceleration limit. The car is allowed to be wrong. It is not allowed to be wrong *enthusiastically*.
The driver-monitoring policy uses the EU's 2025/1899 regulation as its threshold and enforces a 30-minute lockout after two sustained alerts. The numbers are published in the source — and a comment warns that changing them gets your fork banned.
The fork policy is the fourth layer. It is a social contract, not code. But it works.

It is 7:42 AM on a Tuesday. You are in a 2019 Hyundai Kona, on a four-lane suburban arterial, going 35 mph into a left-hand curve. You are looking at your phone. The comma four's driver-facing camera has lost your face for six seconds, then eight, then ten. Somewhere in the openpilot process tree, three independent code paths are about to make a decision about whether you should still be in control of this 3,200-pound vehicle.

I am going to walk you through all three of those decisions, in the order they happen, with the actual code that makes them. The point of the walkthrough is not to convince you that openpilot is safe. The point is to show you that safety in this codebase is a layered argument, not a single assertion — and that the layers fail in different ways, on different timescales, by different mechanisms.

The two-line safety contract

Before any of the three layers is worth discussing, you have to read the two lines that justify all of them. They are in docs/SAFETY.md, in plain English, in the second section. I am going to quote them verbatim because every line of the safety code traces back to one of them.

1. The driver must always be capable to immediately retake manual control of the vehicle, by stepping on the brake pedal or by pressing the cancel button.

2. The vehicle must not alter its trajectory too quickly for the driver to safely react. This means that while the system is engaged, the actuators are constrained to operate within reasonable limits.

That is the entire safety specification. Two sentences. The first one says: no software can ever interpose itself between the driver's foot and the brake. The second one says: the software is allowed to actuate the car, but only gently enough that a human driver can countermand the actuation on human timescales. ISO 11270 and ISO 15622 give numerical values for "reasonable limits"; the lateral limit works out to 0.9 seconds of maximum actuation to achieve a 1 m lateral deviation, which is roughly the time it takes a human to notice and react to a steering input they didn't ask for.

Everything else in the safety architecture — every state in the state machine, every threshold in the watchdog, every timeout in the driver monitor — is a downstream consequence of these two sentences. That is the discipline. That is the entire reason the safety code is small.

Layer one: the state machine firewall

The first code path that runs in our 7:42 AM scenario is the engagement state machine. It is in openpilot/selfdrive/selfdrived/state.py, and it is 98 lines. I will not quote all 98, but I will show you the shape.

SOFT_DISABLE_TIME = 3  # seconds
ACTIVE_STATES = (State.enabled, State.softDisabling, State.overriding)
ENABLED_STATES = (State.preEnabled, *ACTIVE_STATES)

class StateMachine:
  def __init__(self):
    self.current_alert_types = [ET.PERMANENT]
    self.state = State.disabled
    self.soft_disable_timer = 0

Five states: disabled, preEnabled, enabled, softDisabling, overriding. Six event types: IMMEDIATE_DISABLE, USER_DISABLE, SOFT_DISABLE, OVERRIDE_LATERAL, OVERRIDE_LONGITUDINAL, ENABLE, NO_ENTRY, PRE_ENABLE, WARNING. The whole transition table fits in your head.

The first design lesson is priority. The update() method checks IMMEDIATE_DISABLE first, then USER_DISABLE, then the state-specific events. IMMEDIATE_DISABLE is the brake pedal — the driver has touched the brake, openpilot must yield. USER_DISABLE is the cancel button. Both go straight to disabled. Nothing else matters until those are handled.

The second design lesson is the soft-disable timer. When the system wants to disengage but is not in an emergency — say, a driver-monitoring warning that is escalating — the state machine enters softDisabling and starts a 3-second timer. During those 3 seconds, the actuators are still active but a chime is playing and the UI is showing a red banner. If the condition clears (the driver looks at the road, the camera reacquires the face), the system returns to enabled. If the timer runs out, it drops to disabled and the car coasts.

You are 4 minutes into a hands-on drive. The driver-facing camera loses your face for 6 seconds. Alert 1: nothing. 8 seconds: chime. 13 seconds: red banner. The state machine begins the soft-disable countdown. The car is not being yanked out of your hands. It is being told, gently, that the operator is no longer in a state to be the operator, and that it has 3 seconds to find someone who is.

stateDiagram-v2
    [*] --> disabled
    disabled --> preEnabled: ENABLE + PRE_ENABLE
    disabled --> overriding: ENABLE + OVERRIDE
    disabled --> enabled: ENABLE
    preEnabled --> enabled: preconditions clear
    preEnabled --> disabled: precondition fails
    enabled --> overriding: driver overrides lat/lon
    enabled --> softDisabling: SOFT_DISABLE event
    overriding --> enabled: override released
    overriding --> softDisabling: SOFT_DISABLE event
    softDisabling --> enabled: condition clears (within 3s)
    softDisabling --> disabled: 3s timer expires
    enabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE
    preEnabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE
    overriding --> disabled: IMMEDIATE_DISABLE / USER_DISABLE

The state machine is a firewall: it is the only process in the system that gets to issue the IMMEDIATE_DISABLE event, and it is the only process that consults the soft-disable t

12m / Article + audio