ADR-0009: Sandbox technology is bubblewrap; network allowlist is kaged-managed
- Status: Accepted
- Date: 2026-05-21
- Last amended: 2026-05-21
- Deciders: @karasu
- Supersedes: —
- Superseded by: —
- Follows from: RFC-0002
Context
RFC-0002 explores the Linux sandboxing landscape — bwrap, firejail, podman-rootless, nsjail, systemd-nspawn, and a pluggable scheme — against kaged's constraints:
- The cage's DSL contract (ADR-0006) declares fs allowlist, net allowlist, state ephemerality.
- Must run on small Linux hosts (ARM64, modest resources) and on desktop Linux.
- No Docker daemon, no setuid binaries, no kernel patches, no extras at install time.
- Spawn cost must be low — subagents are units of work, not services.
- The supervisor needs to stream stdout/stderr, signal, reap.
- v0 is Linux-only. macOS/Windows deferred.
The cage is brand-critical: "your agents are caged" is one of the top three pitches. The mechanism has to back that claim.
The RFC's lean is bwrap as the default mechanism, with the network allowlist as a kaged-managed concern on top.
Decision
The
[CAGED]mechanism in v0 is bubblewrap (bwrap). The kaged daemon's subagent supervisor spawns every subagent under bwrap with a cage policy compiled from the project DSL. Network allowlisting is enforced by kaged itself via a per-cage network namespace plus a kaged-managed DNS+nftables setup. Resource limits use cgroups viasystemd-runwhen systemd is available, with a documented non-systemd fallback. A default seccomp profile blocks dangerous syscalls; the DSL may relax it explicitly per cage.
What the daemon ships
- bwrap as the spawn mechanism. Required on every kaged host. Documented in install prerequisites.
- A cage compiler (
packages/sandbox/) that translates a DSLcage:block into abwrapargv plus auxiliary network setup. - A network gatekeeper. A small kaged-internal service (per host, not per cage) that owns DNS resolution and nftables rule generation for every running cage's network namespace.
- A default seccomp profile. Blocks
ptrace,kexec_load,init_module,keyctl,mount, kernel module load, and similar host-impacting calls. Lives at a known path inside the daemon and is loaded for every cage. - A cgroups wrapper. When systemd is detected, spawn through
systemd-run --scope --slice=kaged.slicefor resource limits. Otherwise, fall back to direct cgroup v2 manipulation (cgroup-toolsor raw/sys/fs/cgroup/...writes) with documented limitations.
Network allowlist enforcement (the hard part)
We commit to a phased rollout, documented in the spec (docs/specs/sandbox.md):
- Phase 0 — binary net gate (cuttable for very early v0): A cage with an empty
net.allow:gets--unshare-net(no network at all). A cage withnet.allow: ["*"]gets the daemon's network. Anything in between is rejected by the parser with "hostname allowlisting is not yet supported in this kaged version." This unblocks the rest of the daemon work. - Phase 1 — kaged-managed allowlist (v0 release target): Per-cage network namespace + veth pair to the daemon's namespace. A kaged-managed userspace SOCKS5 proxy on the daemon side does hostname-aware filtering. A kaged-managed resolver in the cage's netns only resolves allowlisted glob patterns. nftables drops everything not destined for the proxy.
- Phase 2 — kernel-level filtering (v1): Move hostname filtering closer to the kernel: kaged installs nftables rules driven by SNI/HTTP-Host introspection, eliminating the userspace proxy hop. Stretch goal.
The supervisor never lets a cage run without one of these enforcement states active. Network setup failure = subagent spawn failure.
Default cage profile
When a subagent declares no cage: block, the implicit cage is maximally restrictive:
- fs:
[](no host filesystem access; subagent sees only a tmpfs/,/tmp, and a minimal/usrif explicitly requested) - net:
[](no network) - state:
ephemeral - seccomp: default profile applied
- cgroups: default limits (e.g., 256MB RAM, 1 CPU)
There is no "wide open" implicit cage. Every grant is an explicit decision in the DSL.
Threat model (the line we're committing to)
In scope (kaged is responsible for):
- A subagent breaking out of its bwrap mounts and reading host paths not in its fs allowlist → P0 bug
- A subagent reaching a network destination not in its allowlist → P0 bug
- A subagent persisting state to host disk when declared ephemeral → P0 bug
- A subagent signaling the daemon or another subagent → P0 bug
- A subagent escalating to root on the host → P0 bug
- A subagent exfiltrating other subagents' memory → P0 bug
- The supervisor failing to apply a cage policy at all → P0 bug
Out of scope (kaged cannot defend against):
- Linux kernel CVEs that allow namespace escape from any sandbox using namespaces. (We document the kernel version baseline.)
- The operator deliberately granting wide-open cages and being surprised when subagents act on that grant.
- LLM-provider-side data leakage (the model exfiltrates secrets through its outputs to its hosted provider). This is a model-trust problem, not a sandbox problem. Documented as a separate concern.
- Side-channel attacks (CPU caching, Spectre-class). We are not building a CPU sandbox.
- DoS by a misbehaving subagent that exhausts its cgroup limits. The supervisor kills and reports; we do not promise it never happens.
Consequences
What this commits us to
- A
packages/sandbox/package that owns the cage compiler, network gatekeeper, seccomp profile, and cgroups wrapper. This is a real component, not a thin shell-out. - Sustained engineering on the network gatekeeper. The userspace proxy + resolver is non-trivial code. It is a security-critical service.
- bwrap as a runtime dependency. Documented prerequisite. The installer / first-run check verifies bwrap is present.
- Linux-only v0. macOS and Windows are an explicit deferral until we have an analogous sandbox story per platform.
- A kernel version baseline. We document the minimum supported kernel (likely 5.10+ for stable user-namespace support on the small-host hardware we target).
- Sandbox tests on every CI run. Including escape attempts. A test that should fail (the subagent tries to read
/etc/shadow) must fail every build. - A documented "what the cage cannot defend against" page. Operators get an honest threat model, not marketing.
What this forecloses
- No firejail in v0. Operators who prefer it wait for the pluggable-sandbox story in v1.x.
- No podman-as-cage in v0. Same. The container/image use case is real but adds significant scope.
- No macOS / Windows v0 support. Operators on those platforms run kaged in a Linux VM.
- No skipping the network gatekeeper for "speed." Even when the cage is effectively open, the gatekeeper is in the path. This is a security property, not a performance choice.
- No relying on Docker semantics anywhere. Even when we add a podman adapter later, the bwrap path remains the reference cage. The DSL is bwrap-shaped.
What becomes easier
- The cage block in the DSL maps cleanly to a single mechanism. fs entries →
--bind/--ro-bind. net allowlist → gatekeeper config. state ephemeral →--tmpfs. Operator mental model = the implementation. - Subagent spawn is fast. Tens of milliseconds, which makes the "primary delegates to subagent, subagent finishes, primary reads result" pattern feel responsive.
- Cage policy is auditable. The supervisor logs the effective bwrap argv and gatekeeper rules for every spawn. Operators can grep their audit log to verify the cage they declared is the cage that ran.
- Small-host deployment is realistic. No image registry, no daemon, no setuid surface.
- Brand promise is mechanically backed. "Caged" is not a marketing word; it's the file
packages/sandbox/cage.ts.
What becomes harder
- The network gatekeeper is real engineering. A correct userspace proxy that hostname-filters TCP+TLS is not trivial. We accept that scope.
- Debugging a misbehaving cage is more involved than debugging a misbehaving process. Operators learn to read bwrap argv and to use the daemon's "inspect cage" UI. Documentation tax.
- Per-cage netns has overhead. Spawning a netns + setting up nftables is on the order of 10-100ms. We design for it but it's a real cost.
- The seccomp default will occasionally bite a legitimate subagent. We need an escape hatch (
cage.seccomp: relaxedor per-syscall opt-in) and a clear error message when a subagent is killed by the filter. - Pluggable sandboxing is now a v1 concern, not v0. Operators who want podman cages have to wait. We accept the pressure.
Open spec questions (not load-bearing; resolved in docs/specs/sandbox.md)
- Exact seccomp profile. Which syscalls block, which warn, which allow. Reference Flatpak's profile and Docker's default profile as starting points.
- cgroups defaults. Memory limit, CPU shares, PIDs limit. v0 defaults to be tuned for low-resource ARM64 hardware as the lower bound.
- DSL net allowlist grammar details. Glob patterns (
*.bandcamp.com), exact CIDR support, port restriction (hostname:443vshostname). - Gatekeeper resolver caching. How long does the resolver cache an allowlisted hostname's resolution? Stale cache vs new IP-vs-old-IP edge cases.
- State semantics naming.
ephemeralvsscratchvspersistent— final names land in the DSL spec. - How the operator inspects a running cage. The UI shows effective policy; the spec defines the API endpoint.
Alternatives considered
(Full design-space exploration in RFC-0002. Summarized here.)
Alternative A — firejail
Why rejected: Setuid binary is a trust smell for a tool whose pitch is "caged agents." Historical CVE record in the setuid path. firejail's profile grammar is less surgically expressive than bwrap's bind-mount model. The convenience of pre-shipped profiles doesn't apply when we're translating from a kaged-specific DSL anyway.
Alternative B — podman rootless
Why rejected for v0: Spawn time is too slow for our usage profile (subagents as units of work). Each cage becoming a container adds an image-management burden to the operator that the DSL does not currently model. Rootless setup overhead on small Linux hosts (subuid/subgid mappings, fuse-overlayfs) is real friction. The "I want versioned scraper:latest" use case is legitimate but premature.
We will likely add podman as a v1.x cage type for cages that want image semantics. The DSL would gain cage.image: registry.example.com/scraper:1.2.3 as an opt-in. Default cage stays bwrap.
Alternative C — nsjail
Why rejected: No clear advantage over bwrap for our use case. Smaller community. Less commonly packaged on non-Debian distros. The richer policy-file format is nice but we already template a config file (the DSL); we don't need a second one.
Alternative D — systemd-nspawn
Why rejected: Designed for OS-image containers, overkill for our binary-with-mounts profile. systemd-only, cuts off non-systemd hosts. Heavier than bwrap with no compensating benefit for our use case.
Alternative E — Pluggable from day one
Why rejected for v0: "We support all sandboxes" is the most common form of "we don't have a default that works." Three real adapters means three test matrices and three sets of escape-test investigation. v0 ships one cage mechanism, well-tested, with a documented threat model. Plugin sandboxing comes in v1.
Amendments
2026-05-21 — Sandbox is optional; two opt-out paths
The sandbox is enabled by default on every subagent in every project. The original ADR text could be read as making it mandatory. This amendment clarifies: per the project's "we don't lock anyone into anything" posture, the operator may opt out at two levels.
1. Per-subagent: cage: disabled in the DSL.
A subagent may declare cage: disabled instead of a cage block:
subagents:
- name: deployer
model: claude:sonnet-4.6
system_prompt: ./prompts/deployer.md
can_be_called_by: [primary]
cage: disabled
Semantics:
- The supervisor spawns the subagent as the daemon's own UID, with no
bwrapwrapper, no network namespace, no seccomp filter, no cgroup limits beyond whatever the daemon itself runs under. - The subagent has full read-write access to the host filesystem that the daemon user has.
- The subagent can reach any network destination the daemon user can reach.
- The subagent can invoke any host binary, including
sudoif the daemon user is sudoers.
The honest framing in operator docs: a cage: disabled subagent IS your daemon's hands. Same UID, same access, same blast radius. There are no half-measures. If you write cage: disabled on a subagent, you have decided that subagent is your hand-extension and you trust it accordingly.
Use cases:
- A
deployersubagent that needskubectl,git push, signed-image-build access, or anything else requiring real host capabilities. - A
backupsubagent that needs to read everywhere and write to an external mount. - Operator's-own-tools workflows where caging would just produce friction without security benefit.
When NOT to use cage: disabled:
- A subagent processing untrusted input (a scraper, a webhook handler, a parser of operator-supplied files). Untrusted input + uncaged subagent = the model is now an attack vector with full host access.
Per-subagent warning UX:
- The web UI's project view shows uncaged subagents with a magenta
[UNCAGED]badge (parallel to the standard[CAGED]badge but inverted in tone). - The DSL validator emits a warning at parse time for every
cage: disabledentry:subagents[N].cage: disabled — this subagent runs as the daemon user. see ADR-0009. - The audit log records
subagent.spawn.uncagedfor every spawn of an uncaged subagent (in addition to the normal spawn event). - The session UI shows the cage policy (or "DISABLED") on every subagent invocation.
2. Per-daemon: --no-sandbox global flag.
The operator may run the entire daemon with sandboxing disabled:
- CLI:
kaged start --no-sandbox(orKAGED_NO_SANDBOX=1env var). - Semantics: every subagent in every project on this daemon runs as if it had
cage: disabled, regardless of what its DSL declares. The DSL's cage block is parsed and validated (so the file is still portable to a sandboxed daemon) but it is not enforced. - Use cases: dev machines where the operator is iterating on prompts and doesn't want sandbox-debugging in the loop. Single-user trusted hosts where
cage:blocks would just be ceremony.
Warning UX for --no-sandbox (matches the --insecure pattern from ADR-0007):
| Surface | What appears | When |
|---|---|---|
| CLI startup | Multi-line warning block naming --no-sandbox and a link to this ADR |
Every daemon start |
| Web UI banner | Persistent magenta banner: SANDBOX DISABLED — ALL SUBAGENTS RUN AS DAEMON USER |
Every page render |
| Web UI splash | Modal on first session of each day requiring "I understand" click | First session per day |
| HTTP responses | X-Kaged-Warning: no-sandbox header |
Every response |
| Audit log | sandbox.disabled event at startup |
Every daemon start |
| Subagent badges | Every subagent shows [UNCAGED] regardless of its DSL |
Always |
Combining --insecure (no auth) with --no-sandbox (no cage) is allowed but the banner is very magenta: INSECURE — NO AUTH — NO SANDBOX. The audit log records the combination explicitly. We do not refuse the combination — operator's principal, operator's call — but we make it impossible to miss.
3. What is NOT in v1.
These were considered and deferred:
- Per-project
sandbox: disabled(whole-project opt-out from the project DSL top level). Deferred — the existing per-subagentcage: disabledcovers most use cases, and per-project would tempt operators into "I'll just disable it for the whole project while I prototype" patterns that survive into production. If operators ask for it, we add it as a minor. - Per-task ephemeral override (clicking "run uncaged this once" in the UI). Deferred to v1.x. Adds a third concept (task-level cage override) that complicates the audit story.
- Three-state cage (
enabled/relaxed/disabledwhere relaxed means "default seccomp only, no fs/net cage"). Deferred —disabledis honest,enabledis the contract; we don't ship a half-measure that operators will misread as "partially safe."
Updated threat model:
In-scope items remain the same when the sandbox is enabled. When opted-out (cage: disabled or --no-sandbox), the operator has explicitly removed kaged from the protection path. kaged's only obligation in that mode is to be loud about it — the warnings above. The "caged" brand promise holds when the cage is on; when the operator turns it off, the cage promise does not apply and the operator-facing surface says so plainly.
This is the manifesto principle 1 ("The operator is the principal") plus principle 5 ("Sandbox by default, escalate by intent") working together: default-on enforcement, explicit opt-out, no surprises either way.
References
- RFC-0002 — design-space exploration
docs/02-architecture.md— architecture sketch of the sandbox componentdocs/03-glossary.md—cage,subagent,supervisor,insecure mode- ADR-0006 — the DSL that declares cage policies
- ADR-0007 — parallel opt-out via
--insecure docs/brand/brand-guide.md— the "caged" brand promise this ADR backs- bubblewrap: https://github.com/containers/bubblewrap
- Flatpak's bwrap usage: https://docs.flatpak.org/en/latest/sandbox-permissions.html
- nftables: https://wiki.nftables.org/
- Original discussion: design conversation with colleagues, 2026-05-21
- Amendment: colleagues, 2026-05-21
2026-05-26 — Cage location moves to per-agent AgentSpec.cage (ADR-0022)
Sandbox mechanism is unchanged (still bwrap, still the same threat model, same seccomp profiles). Cage location in the DSL moves from subagents.<name>.cage + cage_defaults to per-agent AgentSpec.cage per ADR-0022. The primary agent now has a cage field; cage_defaults is removed. Until the supervisor is extended to spawn the primary in its own bwrap context, the only legal value for the root agent's cage is disabled — the parser emits an error for any other value. A follow-up ADR (or amendment to this one) schedules the supervisor work for primary cage enforcement.