commaai/openpilot / Chapter 3

Programming /

E02_Three_Layers_of_Saying_No

# Three Layers of Saying No > openpilot's safety is not a single check. It is three independent defensive layers — state machine, excessive-actuation watchdog, driver monitoring — plus a fourth social-contract layer (the fork policy) that nobody writes tests for. Each layer can fail. The system survives. ## Key Takeaways - `docs/SAFETY.md` specifies exactly two non-negotiable requirements: the driver can always retake control, and the vehicle cannot outpace the driver's reaction time. Every line of safety code traces back to one of those two lines. - The state machine in `state.py` is small enough to be obviously correct — 98 lines, five states, six event types, three-second soft-disable timer. The art is the *priority* of the events, not the number of them. - `ExcessiveActuationCheck` in `helpers.py` enforces the second safety requirement at runtime with a 2× threshold: twice the maximum allowed longitudinal acceleration, twice the ISO lateral-acceleration limit. The car is allowed to be wrong. It is not allowed to be wrong *enthusiastically*. - The driver-monitoring policy uses the EU's 2025/1899 regulation as its threshold and enforces a 30-minute lockout after two sustained alerts. The numbers are published in the source — and a comment warns that changing them gets your fork banned. - The fork policy is the fourth layer. It is a social contract, not code. But it works. It is 7:42 AM on a Tuesday. You are in a 2019 Hyundai Kona, on a four-lane suburban arterial, going 35 mph into a left-hand curve. You are looking at your phone. The comma four's driver-facing camera has lost your face for six seconds, then eight, then ten. Somewhere in the openpilot process tree, three independent code paths are about to make a decision about whether you should still be in control of this 3,200-pound vehicle. I am going to walk you through all three of those decisions, in the order they happen, with the actual code that makes them. The point of the walkthrough is not to convince you that openpilot is safe. The point is to show you that **safety in this codebase is a layered argument, not a single assertion** — and that the layers fail in different ways, on different timescales, by different mechanisms. ## The two-line safety contract Before any of the three layers is worth discussing, you have to read the two lines that justify all of them. They are in `docs/SAFETY.md`, in plain English, in the second section. I am going to quote them verbatim because every line of the safety code traces back to one of them. > 1. The driver must always be capable to immediately retake manual control of the vehicle, by stepping on the brake pedal or by pressing the cancel button. > 2. The vehicle must not alter its trajectory too quickly for the driver to safely react. This means that while the system is engaged, the actuators are constrained to operate within reasonable limits. That is the entire safety specification. Two sentences. The first one says: no software can ever interpose itself between the driver's foot and the brake. The second one says: the software is allowed to actuate the car, but only gently enough that a human driver can countermand the actuation on human timescales. ISO 11270 and ISO 15622 give numerical values for "reasonable limits"; the lateral limit works out to **0.9 seconds of maximum actuation to achieve a 1 m lateral deviation**, which is roughly the time it takes a human to notice and react to a steering input they didn't ask for. Everything else in the safety architecture — every state in the state machine, every threshold in the watchdog, every timeout in the driver monitor — is a downstream consequence of these two sentences. That is the discipline. That is the entire reason the safety code is small. ## Layer one: the state machine firewall The first code path that runs in our 7:42 AM scenario is the engagement state machine. It is in `openpilot/selfdrive/selfdrived/state.py`, and it is 98 lines. I will not quote all 98, but I will show you the shape. ```python SOFT_DISABLE_TIME = 3 # seconds ACTIVE_STATES = (State.enabled, State.softDisabling, State.overriding) ENABLED_STATES = (State.preEnabled, *ACTIVE_STATES) class StateMachine: def __init__(self): self.current_alert_types = [ET.PERMANENT] self.state = State.disabled self.soft_disable_timer = 0 ``` Five states: `disabled`, `preEnabled`, `enabled`, `softDisabling`, `overriding`. Six event types: `IMMEDIATE_DISABLE`, `USER_DISABLE`, `SOFT_DISABLE`, `OVERRIDE_LATERAL`, `OVERRIDE_LONGITUDINAL`, `ENABLE`, `NO_ENTRY`, `PRE_ENABLE`, `WARNING`. The whole transition table fits in your head. The first design lesson is **priority**. The `update()` method checks `IMMEDIATE_DISABLE` first, then `USER_DISABLE`, then the state-specific events. `IMMEDIATE_DISABLE` is the brake pedal — the driver has touched the brake, openpilot must yield. `USER_DISABLE` is the cancel button. Both go straight to `disabled`. Nothing else matters until those are handled. The second design lesson is the **soft-disable timer**. When the system wants to disengage but is not in an emergency — say, a driver-monitoring warning that is escalating — the state machine enters `softDisabling` and starts a 3-second timer. During those 3 seconds, the actuators are still active but a chime is playing and the UI is showing a red banner. If the condition clears (the driver looks at the road, the camera reacquires the face), the system returns to `enabled`. If the timer runs out, it drops to `disabled` and the car coasts. You are 4 minutes into a hands-on drive. The driver-facing camera loses your face for 6 seconds. Alert 1: nothing. 8 seconds: chime. 13 seconds: red banner. The state machine begins the soft-disable countdown. **The car is not being yanked out of your hands. It is being told, gently, that the operator is no longer in a state to be the operator, and that it has 3 seconds to find someone who is.** ```mermaid stateDiagram-v2 [*] --> disabled disabled --> preEnabled: ENABLE + PRE_ENABLE disabled --> overriding: ENABLE + OVERRIDE disabled --> enabled: ENABLE preEnabled --> enabled: preconditions clear preEnabled --> disabled: precondition fails enabled --> overriding: driver overrides lat/lon enabled --> softDisabling: SOFT_DISABLE event overriding --> enabled: override released overriding --> softDisabling: SOFT_DISABLE event softDisabling --> enabled: condition clears (within 3s) softDisabling --> disabled: 3s timer expires enabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE preEnabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE overriding --> disabled: IMMEDIATE_DISABLE / USER_DISABLE ``` The state machine is a **firewall**: it is the only process in the system that gets to issue the `IMMEDIATE_DISABLE` event, and it is the only process that consults the soft-disable t

Chapter 3 of 5 12m Article Learning path

Three Layers of Saying No

openpilot's safety is not a single check. It is three independent defensive layers — state machine, excessive-actuation watchdog, driver monitoring — plus a fourth social-contract layer (the fork policy) that nobody writes tests for. Each layer can fail. The system survives.

Key Takeaways

  • docs/SAFETY.md specifies exactly two non-negotiable requirements: the driver can always retake control, and the vehicle cannot outpace the driver's reaction time. Every line of safety code traces back to one of those two lines.
  • The state machine in state.py is small enough to be obviously correct — 98 lines, five states, six event types, three-second soft-disable timer. The art is the *priority* of the events, not the number of them.
  • ExcessiveActuationCheck in helpers.py enforces the second safety requirement at runtime with a 2× threshold: twice the maximum allowed longitudinal acceleration, twice the ISO lateral-acceleration limit. The car is allowed to be wrong. It is not allowed to be wrong *enthusiastically*.
  • The driver-monitoring policy uses the EU's 2025/1899 regulation as its threshold and enforces a 30-minute lockout after two sustained alerts. The numbers are published in the source — and a comment warns that changing them gets your fork banned.
  • The fork policy is the fourth layer. It is a social contract, not code. But it works.

It is 7:42 AM on a Tuesday. You are in a 2019 Hyundai Kona, on a four-lane suburban arterial, going 35 mph into a left-hand curve. You are looking at your phone. The comma four's driver-facing camera has lost your face for six seconds, then eight, then ten. Somewhere in the openpilot process tree, three independent code paths are about to make a decision about whether you should still be in control of this 3,200-pound vehicle.

I am going to walk you through all three of those decisions, in the order they happen, with the actual code that makes them. The point of the walkthrough is not to convince you that openpilot is safe. The point is to show you that safety in this codebase is a layered argument, not a single assertion — and that the layers fail in different ways, on different timescales, by different mechanisms.

The two-line safety contract

Before any of the three layers is worth discussing, you have to read the two lines that justify all of them. They are in docs/SAFETY.md, in plain English, in the second section. I am going to quote them verbatim because every line of the safety code traces back to one of them.

1. The driver must always be capable to immediately retake manual control of the vehicle, by stepping on the brake pedal or by pressing the cancel button.
2. The vehicle must not alter its trajectory too quickly for the driver to safely react. This means that while the system is engaged, the actuators are constrained to operate within reasonable limits.

That is the entire safety specification. Two sentences. The first one says: no software can ever interpose itself between the driver's foot and the brake. The second one says: the software is allowed to actuate the car, but only gently enough that a human driver can countermand the actuation on human timescales. ISO 11270 and ISO 15622 give numerical values for "reasonable limits"; the lateral limit works out to 0.9 seconds of maximum actuation to achieve a 1 m lateral deviation, which is roughly the time it takes a human to notice and react to a steering input they didn't ask for.

Everything else in the safety architecture — every state in the state machine, every threshold in the watchdog, every timeout in the driver monitor — is a downstream consequence of these two sentences. That is the discipline. That is the entire reason the safety code is small.

Layer one: the state machine firewall

The first code path that runs in our 7:42 AM scenario is the engagement state machine. It is in openpilot/selfdrive/selfdrived/state.py, and it is 98 lines. I will not quote all 98, but I will show you the shape.

SOFT_DISABLE_TIME = 3  # seconds
ACTIVE_STATES = (State.enabled, State.softDisabling, State.overriding)
ENABLED_STATES = (State.preEnabled, *ACTIVE_STATES)

class StateMachine:
  def __init__(self):
    self.current_alert_types = [ET.PERMANENT]
    self.state = State.disabled
    self.soft_disable_timer = 0

Five states: disabled, preEnabled, enabled, softDisabling, overriding. Six event types: IMMEDIATE_DISABLE, USER_DISABLE, SOFT_DISABLE, OVERRIDE_LATERAL, OVERRIDE_LONGITUDINAL, ENABLE, NO_ENTRY, PRE_ENABLE, WARNING. The whole transition table fits in your head.

The first design lesson is priority. The update() method checks IMMEDIATE_DISABLE first, then USER_DISABLE, then the state-specific events. IMMEDIATE_DISABLE is the brake pedal — the driver has touched the brake, openpilot must yield. USER_DISABLE is the cancel button. Both go straight to disabled. Nothing else matters until those are handled.

The second design lesson is the soft-disable timer. When the system wants to disengage but is not in an emergency — say, a driver-monitoring warning that is escalating — the state machine enters softDisabling and starts a 3-second timer. During those 3 seconds, the actuators are still active but a chime is playing and the UI is showing a red banner. If the condition clears (the driver looks at the road, the camera reacquires the face), the system returns to enabled. If the timer runs out, it drops to disabled and the car coasts.

You are 4 minutes into a hands-on drive. The driver-facing camera loses your face for 6 seconds. Alert 1: nothing. 8 seconds: chime. 13 seconds: red banner. The state machine begins the soft-disable countdown. The car is not being yanked out of your hands. It is being told, gently, that the operator is no longer in a state to be the operator, and that it has 3 seconds to find someone who is.

stateDiagram-v2
    [*] --> disabled
    disabled --> preEnabled: ENABLE + PRE_ENABLE
    disabled --> overriding: ENABLE + OVERRIDE
    disabled --> enabled: ENABLE
    preEnabled --> enabled: preconditions clear
    preEnabled --> disabled: precondition fails
    enabled --> overriding: driver overrides lat/lon
    enabled --> softDisabling: SOFT_DISABLE event
    overriding --> enabled: override released
    overriding --> softDisabling: SOFT_DISABLE event
    softDisabling --> enabled: condition clears (within 3s)
    softDisabling --> disabled: 3s timer expires
    enabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE
    preEnabled --> disabled: IMMEDIATE_DISABLE / USER_DISABLE
    overriding --> disabled: IMMEDIATE_DISABLE / USER_DISABLE

The state machine is a firewall: it is the only process in the system that gets to issue the IMMEDIATE_DISABLE event, and it is the only process that consults the soft-disable timer. Everything else in the codebase can only *request* a state transition. The state machine decides. This is the cleanest separation of concerns I have seen in any open-source safety-critical codebase, and it is 98 lines.

Layer two: the seismograph

The state machine answers "should we be driving?". The excessive-actuation watchdog answers a different, complementary question: "is the way we are driving physically inside the safety envelope?" The watchdog lives in openpilot/selfdrive/selfdrived/helpers.py, in a class called ExcessiveActuationCheck, and it is 55 lines. Here is the part that matters.

MIN_EXCESSIVE_ACTUATION_COUNT = int(0.25 / DT_CTRL)
MIN_LATERAL_ENGAGE_BUFFER = int(1 / DT_CTRL)

# longitudinal
accel_calibrated = calibrated_pose.acceleration.x
excessive_long_actuation = sm['carControl'].longActive and (accel_calibrated > ACCEL_MAX * 2 or accel_calibrated < ACCEL_MIN * 2)

# lateral
yaw_rate = calibrated_pose.angular_velocity.yaw
roll = sm['liveParameters'].roll
roll_compensated_lateral_accel = (CS.vEgo * yaw_rate) - (math.sin(roll) * ACCELERATION_DUE_TO_GRAVITY)

excessive_lat_actuation = False
self._engaged_counter = self._engaged_counter + 1 if sm['carControl'].latActive and not CS.steeringPressed else 0
if self._engaged_counter > MIN_LATERAL_ENGAGE_BUFFER:
  if abs(roll_compensated_lateral_accel) > ISO_LATERAL_ACCEL * 2:
    excessive_lat_actuation = True

Read the thresholds. ACCEL_MAX * 2. ISO_LATERAL_ACCEL * 2. The car is allowed to be wrong. It is not allowed to be wrong *enthusiastically*. A planner bug that requests 4 m/s² of acceleration when the safety envelope is 2 m/s² will trip the watchdog in 0.25 seconds and force a soft-disable. A controller bug that commands twice the ISO lateral-acceleration limit on a curve — exactly the curve you are entering at 7:42 AM — will trip the watchdog after 1 second of continued engagement and force a soft-disable.

The watchdog is a seismograph. It does not care *why* the car is exceeding the safety envelope. It does not care if the planner has a bug, or the controller has a bug, or the model has learned a bad policy. It measures the actual physical motion of the car — accel_calibrated from the IMU, roll_compensated_lateral_accel from the yaw rate and the roll angle — and compares it to twice the regulatory limit. If the actual motion exceeds twice the limit for 0.25 seconds longitudinally or 1 second laterally, the watchdog declares an excessive actuation. The state machine receives the event and enters softDisabling. The car is asked to slow down.

I want to flag the design choice that is easy to miss. The watchdog uses *twice* the limit, not the limit itself. This is the hysteresis engineering that is the difference between a watchdog that works and a watchdog that nuisance-trips every time a driver goes over a bump. The 2× factor is the safety margin; the 0.25-second counter is the time-base filter; the cross-check against livePose is the anti-spurious-trip filter. All three are required. Drop any one and you get either a watchdog that never trips (a safety hole) or a watchdog that trips when the car goes over a pothole (a usability disaster that trains users to ignore alerts).

Layer three: the human monitor

The third layer is the one that operates on our 7:42 AM driver. It is the driver-monitoring policy in openpilot/selfdrive/monitoring/policy.py, and the constants are published in the source. I am going to quote the comment above them, because the comment is the fourth layer of the safety argument.

# ******************************************************************************************
#  NOTE: To fork maintainers.
#  Disabling or nerfing safety features will get you and your users banned from our servers.
#  We recommend that you do not change these numbers from the defaults.
# ******************************************************************************************

Below the comment: a class called DRIVER_MONITOR_SETTINGS. _ALERT_MIN_SPEED = 2.8 m/s — 10 km/h, the speed below which the system does not bother checking. _VISION_POLICY_ALERT_1_TIMEOUT = 5., _ALERT_2_TIMEOUT = 8., _ALERT_3_TIMEOUT = 13. — the time-out cascading for an inattentive driver. _NO_RESPONSE_TIMEOUT = 5.. _MAX_ALERT_3 = 2. _MAX_NO_RESPONSE = 1. _LOCKOUT_TIME = int(1800 / DT_DMON) — 30 minutes.

What this means in our scenario: at second 5, an audio chime. At second 8, a louder chime and a visual alert. At second 13, a red banner and the state machine begins the soft-disable countdown. If the driver ignores three red banners in a row, the system locks openpilot out for 30 minutes. The lockout is not a setting the user can disable. The user can put the device in airplane mode. The user can ignore the alerts. The system will still not engage.

The interesting line is the EU regulation reference. The policy's _ALERT_MIN_SPEED is derived from a specific clause of eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202501899 — the EU's 2025 regulation on driver drowsiness and attention warning systems. comma is not making up the numbers. comma is implementing a regulation. The driver-monitoring policy is, in a literal sense, a regulatory compliance artifact. That is a much stronger safety claim than "we tested it a lot", and it is the kind of claim that the automotive industry normally does not get to make about consumer software.

Layer four: the social contract

There is one more layer, and it is not code. It is in docs/SAFETY.md, in the "Forks of openpilot" section. I will quote it because it is the most underappreciated line in the entire repository.

* Do not disable or nerf [driver monitoring]
* Do not disable or nerf [excessive actuation checks]
* If your fork modifies any of the code in opendbc/safety/:
* your fork cannot use the openpilot trademark
* your fork must preserve the full safety test suite and all tests must pass, including any new coverage required by the fork's changes
Failure to comply with these standards will get you and your users banned from comma.ai servers.
comma.ai strongly discourages the use of openpilot forks with safety code either missing or not fully meeting the above requirements.

The social contract is: if you fork openpilot and remove the driver monitoring or weaken the actuation checks, you cannot use the comma brand, you cannot get your users on comma's data infrastructure, and comma will actively ban you. The fourth safety layer is the threat that a forker will be cut off from the upstream server. It works. It is the only social-contract safety mechanism I know of in a consumer-software project that I would describe as effective, and it works because the upstream server is the asset the forkers need.

I am going to be honest about its limits. The social contract does not stop a sufficiently motivated forker from reimplementing the entire stack. It does not stop a hostile actor. It does not stop a regulator in a country where comma has no presence. What it does stop is the boring failure mode: a forker, in good faith, deciding that the driver monitor is annoying and removing it for a paying customer. That decision used to be a one-line PR. Now it is a brand-identity decision, and the fork has to think about what it is.

What to do when the system escalates

The system has four escalation states, and they are designed to be readable from the UI without explanation. Level 1: chime only. Level 2: chime plus visual alert. Level 3: red banner plus soft-disable countdown. Level 4: lockout.

If you are driving and the device plays the level-1 chime, the right response is to look at the road. If you are driving and the device shows the level-2 alert, the right response is to look at the road and confirm your hands are on the wheel. If the device shows the level-3 red banner, the right response is to brake — IMMEDIATE_DISABLE — and take over. The system is asking you to leave. The fastest way out is the brake pedal. The cancel button works too. The state machine will not argue with either. If the device has locked you out for 30 minutes, the right response is to stop and take a break, which is the entire point of the lockout.

I came into this chapter thinking the safety story would be a hard problem. The codebase is small, but the design is layered, and every layer traces to a written specification, and the regulatory layer is not metaphor — it is a published EU regulation. The fact that all of this is open source, forkable, modifiable, and yet *not* routinely modified in the safety code is the strongest single piece of evidence I can offer that the design is right. The dangerous edits are not being made, by anyone, anywhere. That is a result. It is the result the social-contract layer is supposed to produce, and it is producing it.

I started this chapter believing safety is one good test. I finished it believing safety is three independent failures you have to survive, plus a social contract about who gets to publish the test suite. The bloat is the budget. The 98 lines of state.py and the 55 lines of helpers.py and the 80 lines of policy.py are the entire safety argument, in code, with the regulatory citations in the comments. Now we know the safety story. The next question is the obvious one: what is the model that the safety layers are protecting, and what changed about it in 2025?

---

References: