Deepmox / Programming

learning path

apple/container

A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.

6 chapters 2 audio lessons 2 videos 3 free previews Fresh topic

Start here

1. One_VM_Per_Container

One VM Per Container

A Swift CLI on macOS runs each Linux container in its own lightweight VM — that single decision reshapes the security model, the helper topology, and the macOS-version matrix.

Series in one paragraph

container is Apple's answer to "I want Linux containers that feel native on my Mac." What makes it worth your attention is not the CLI surface or the OCI compatibility — both competent, both unsurprising — but the architectural bet underneath: every container gets its own VM, not a slot in a shared VM. That decision cascades into a fault-isolated helper topology, a separate "container machine" abstraction, and a hard dependency on macOS 26 that you cannot negotiate around. Read the series and you'll know not just what container does, but why it does it that way — and where the model breaks.

This opening chapter sets the scene; the next four walk the architectural thesis, the runtime topology, the container-machine ergonomic, and the honest list of trade-offs.

The first time you run it

Picture Tuesday morning. You've just installed container on an M-series Mac. You open a terminal and type the command that starts everything:

container system start

The CLI doesn't ask you to choose between a Linux VM backend and a hypervisor framework. It doesn't ask you to allocate a memory pool. It just installs a recommended Kata Containers kernel the first time, then leaves you at a prompt. You run container run --rm alpine:latest echo hello, and the response comes back. Activity Monitor shows a new process called container-runtime-linux, a few hundred megabytes of memory, and a transient entry for the VM itself. That entry disappears the moment the container exits.

If you've used Docker Desktop on macOS, this experience is mildly disorienting. Docker Desktop owns a long-lived Linux VM that sits in your menu bar — a single qemu or VZLinux process you can see growing as you stack up containers. Stop and start one container, and you're reconfiguring slots inside a hotel. Here, the hotel doesn't exist. Each container checks in, gets a room, and checks out, and the rooms are built and demolished around them.

That's the experience. Now the architecture.

What Apple actually shipped

container is a Swift Package Manager project. The container CLI binary is a thin client; the work happens in a launchd-managed launch agent called container-apiserver. That daemon, in turn, spaw

5m / Article + audio + video

2. The_One_VM_Thesis

The One-VM Thesis

Per-container VMs aren't a waste of resources — they're a fundamentally different isolation posture, and the boot-time/memory evidence inverts the obvious reading.

Key Takeaways

container runs one lightweight VM per container, not many containers in one shared VM. This is its architectural thesis, and every other design choice flows from it.
A per-container VM is faster to boot than a shared VM despite higher *steady-state* overhead, because there is no shared VM to spin up in the first place.
Isolation moves from a configuration problem (capabilities, seccomp, namespaces) to a structural property (separate kernels, separate init namespaces).
The trade-off is honest: more memory under load, no memory ballooning back to the host, and full VM overhead per container. Apple accepted that cost; the question is whether you should.

The paradox nobody warns you about

I came into this code expecting the per-container VM decision to be obviously wasteful. One kernel per container sounds like overengineering when Docker, Podman, and Lima all run dozens of containers inside a single Linux VM and call it a day. Then I read docs/technical-overview.md and the numbers rearranged my priors:

"Containers created using container require less memory than full VMs, with boot times that are comparable to containers running in a shared VM."

Less memory than full VMs. Boot times comparable to shared-VM containers. Both of those are true because the comparison isn't being made the way I assumed. The relevant comparison is not "per-container VM vs shared VM holding one container." It's "per-container VM that exists for the life of one process vs shared VM that lives forever whether you use it or not." That comparison is much closer than it looks on paper, and once you internalize it, the architectural choice stops looking like waste and starts looking like precision.

What the alternatives actually are

To see why Apple rejected the obvious model, it helps to lay the alternatives side by side. There are three reasonable ways to run Linux containers on macOS, and each one spends your resources differently:

Option A — Shared VM (Docker Desktop, OrbStack, Podman Desktop, Colima). A single Linux VM boots when the tool starts and stays up. Containers inside it are Linux processes — namespaces, cgroups, capabilities, seccomp profiles. The kernel is shared. The init system is shared. The network namespace is shared unless you make it private. Boot time per container is essentially "fork a process" (sub-second). Memory overhead per container is essentially the memory of its processes. The cost you pay is structural coupling: a kernel-level bug in one container's syscall path is a kernel-level bug in every container, because they all run in the same kernel.

Option B — Per-container VM (container). Each container run boots a small VM configured for one workload. A minimal Linux rootfs, a minimal set of core utilities, the dynamic libraries your process actually needs. When the process exits, the VM tears down. There is no long-lived "the VM." Boot time per container is on the order of seconds — slower than fork, faster than spinning up a shared VM from cold. Memory overhead is the cost of one kernel per container, which on Apple silicon with a tuned rootfs is small but not zero. The cost you pay is duplicated kernel state and the absence of memory ballooning back to the host.

Option C — Linux processes on macOS natively. Not a real option. macOS isn't Linux. You can run Linux ELF binaries through a compatibility layer, but you don't get a Linux filesystem tree, an init system, or the syscall surface most server software assumes. Drop this from the decision matrix.

The decision is between A and B. They are not equivalent; they are postures.

graph TB
    subgraph "Option A: Shared VM (Docker Desktop, OrbStack)"
        H1[macOS host] --> VM1[One long-lived Linux VM]
        VM1 --> K1[Single shared kernel]
        K1 --> P1[Container 1<br/>Linux process]

7m / Article + audio + video

3. The_Helper_Topology

The Helper Topology

Four processes after container system start looks like overengineering until you realize each one is the price of fault isolation in a per-container VM world.

Key Takeaways

container is a thin CLI on top of four distinct processes: the apiserver launch agent plus three categories of XPC helper, with one extra helper per container.
The per-container helper exists so a stuck or crashing container can't take down siblings — fault isolation follows directly from the per-container VM decision.
The CLI never talks to a VM directly. Every command flows: CLI → apiserver → XPC helper → VM. The XPC boundary is the privileged boundary.
The plugin model (Sources/Plugins/RuntimeLinux, Sources/Plugins/NetworkVmnet, Sources/Plugins/CoreImages, Sources/Plugins/MachineAPIServer) is swift-argument-parser AsyncParsableCommand — each helper is independently testable, shippable, and replaceable.

What you actually see when you start it

You're debugging a stuck container. You open Activity Monitor. Below the container CLI process — which by now has long since exited — you find a different shape:

container-apiserver           (one process, launchd-managed)
container-core-images         (one XPC helper, child of apiserver)
container-network-vmnet       (one XPC helper, child of apiserver)
container-runtime-linux.<id>  (one XPC helper per container)

Four processes at minimum. One for every running container. That looks heavy if you're coming from a single-Docker-VM model where the daemon, the registry client, the network namespace, and the runtime all live inside one Linux userland. It looks correct if you take the per-container VM thesis seriously: each container needs a translator between macOS-land and Linux-VM-land, and isolating that translator per-container is the only way to keep one bad container from poisoning the others.

I assumed XPC was just "macOS IPC" — a synonym for the Unix socket mechanism other container tools use. After reading the source, I now think of it as a fault-isolation boundary drawn with a process syscall. Every helper is a separate process tree, with its own privileges, its own audit token, and its own crash domain. When a container-runtime-linux.<id> panics inside its VM, the kernel logs it and the helper exits. The apiserver notices, marks the container as stopped, and the other containers continue.

That's the architecture in one sentence. The rest of this chapter traces how it's built.

The process tree, end to end

flowchart LR
    User([User: container CLI]) -->|XPC| API[container-apiserver<br/>launchd launch agent]
    API -->|XPC| CI[container-core-images<br/>image + OCI store]
    API -->|XPC| NV[container-network-vmnet<br/>vmnet framework bridge]
    API -->|XPC| RL1[container-runtime-linux.A<br/>per-container helper]
    API -->|XPC| RL2[container-runtime-linux.B<br/>per-container helper]
    API -->|XPC| RL3[container-runtime-linux.N<br/>per-container helper]
    RL1 --> VM1[VM A<br/>own kernel]
    RL2 --> VM2[VM B<br/>own kernel]
    RL3 --> VMN[VM N<br/>own kernel]
    NV -.allocates IP.-> RL1
    NV -.allocates IP.-> RL2
    NV -.allocates IP.-> RL3

The diagram has more arrows than the simpler "shared VM" view, but every arrow is doing real work. The container-network-vmnet helper, for instance, owns the lifecycle of the vmnet network attachment and the IP allocator. The per-container container-runtime-linux helper owns the VM's virtio devices, the vsock channel to vminitd inside the guest, and the gRPC surface for executing processes.

Where the architecture lives in the source

The plugin directory layout tells the story before you read a single file:

Sources/Plugins/
├── CoreImages/        # container-core-images — image & content store
├── MachineAPIServer/  # the API server's launchd entry point
├── NetworkVmnet/      # container-network-vmnet — vmnet glue
└── RuntimeLinux/      # container-runtime-linux — per-container VM manager

Each plugin is its own Swift module with its own @main AsyncParsableCommand. Look at Sources/Plugins/RuntimeLinux/RuntimeLinuxHelper.swift:

@main
struct RuntimeLinuxHelper: AsyncParsableCommand {
    static let configuration = CommandConfiguration(
        commandName: "container-runtime-linux",
        abstract: "XPC Service for managing a Linux sandbox",
        version: ReleaseVersion.singleLine(appName: "container-runtime-linux"),
        subcommands: [
            Start.self
        ]
    )
}

That's the whole file. The work happens in RuntimeLinuxHelper+Start.swift, which holds the actual Start subcommand. The structural choice — commandName matching the binary's name, abstract describing what the XPC service does — makes each helper individually introspectable with --help. You can run container-runtime-linux --help on its own and see what it can do. The same pattern holds for the network and image helpers. This is what "modular system service" looks like in Swift: not a class hierarchy, but a directory of independently-spawned processes, each one a swift-argument-parser program.

The ContainerConfiguration struct in Sources/ContainerResource/Container/ContainerConfiguration.swift is the wire format for what flows between the CLI, the apiserver, and the per-container helper. Its twenty-five fields describe eve

9m / Article + audio

Premium chapters

4. Container_Machines

Available after upgrade / 7m

5. The_MacOS_Matrix

Available after upgrade / 8m

6. README

Available after upgrade / 2m