Spec: Daemon

Status: Draft
Last amended: 2026-06-19 (Notification router + Web Push bridge — ADR-0047)
Constrained by: ADR-0001, ADR-0002, ADR-0004, ADR-0005, ADR-0007, ADR-0008, ADR-0009, ADR-0010, ADR-0011, ADR-0015, ADR-0016, ADR-0028, ADR-0036, ADR-0040, ADR-0041, ADR-0047
Implements: packages/daemon/ (planned)

Purpose

This spec defines the kaged daemon as a process: how it starts, how it's configured, where it lives on disk, what it runs in what order, how it shuts down, and how the operator interacts with it via the CLI.

This document is normative for:

The daemon's process model — single binary, supervised children, no internal forking.
The configuration sources and precedence (env, file, flags).
The filesystem layout under KAGED_HOME.
The startup self-check sequence, including the security gates from ADR-0007 and ADR-0009.
The CLI surface (kaged ...).
Logging streams (operational vs audit) and where they go.
Crash and restart semantics for the daemon, subagents, and plugins.
The systemd integration shape that v0 ships with.

It is not normative for:

The HTTP/WS surface itself (that's http-api.md).
The session state machine and PTY broker internals (that's session-manager.md).
The sandbox compiler and network gatekeeper (that's sandbox.md).
The plugin host wire protocol (that's plugin-host.md).
The DSL format (that's project-dsl.md).

This spec is about the runtime container that hosts all of the above.

Constraints (from ADRs)

Constraint	Source
Daemon is the lifecycle root — only the init system is its parent	ADR-0001
HTTP+WS server is the principal surface; web UI is bundled and served by the daemon	ADR-0002
Runtime is Bun; single-binary deploys via `bun build --compile`	ADR-0004
Default storage is SQLite at a file path; Postgres opt-in via URL	ADR-0005
Default bind is loopback; sidecar required unless `--insecure`	ADR-0007 and its amendment
Plugins are subprocess children supervised by the daemon	ADR-0008
Sandbox is on by default; `--no-sandbox` is a daemon-level opt-out	ADR-0009 and its amendment
Two deployment modes: per-user and system-wide, both first-class	ADR-0010
Projects are portable; operator-local concerns live in local config	ADR-0011
Unified user identity, shared SSO, three-tier roles, and route-scope authorization	ADR-0036

Deployment mode

The daemon runs in one of two modes, picked at startup (per ADR-0010). The mode determines paths, default auth, and the default systemd unit shape — but not behavior.

Mode detection

At the start of phase 1 (bootstrapping), the daemon resolves its mode in this order:

Explicit override. --mode=user or --mode=system CLI flag, or KAGED_MODE=user|system. If set, that's the mode. (Used by tests; rarely set by operators.)
KAGED_HOME set explicitly. The mode is inferred from the path:
- Inside /var/lib, /opt, or any path the operator's UID doesn't own → system.
- Inside $HOME or $XDG_DATA_HOME → user.
UID and ownership check. Daemon running as a dedicated kaged system user or as UID 0 → system. Daemon running as a regular user → user.
XDG path probe. If $XDG_DATA_HOME/kaged or ~/.local/share/kaged exists → user. If /var/lib/kaged exists and is readable by the daemon user → system.
Default fallback: user. The friendliest default for a fresh install.

The resolved mode is printed at startup: kaged 0.1.0 starting | mode=user | bind=127.0.0.1:38291 | ....

Mode-determined defaults

Default	`user` mode	`system` mode
`${KAGED_HOME}`	`$XDG_DATA_HOME/kaged` (default `~/.local/share/kaged`)	`/var/lib/kaged`
Operational config	`$XDG_CONFIG_HOME/kaged/config.toml` (default `~/.config/kaged/config.toml`)	`/etc/kaged/config.toml`
Local config (per `local-config.md`)	`$XDG_CONFIG_HOME/kaged/local.toml`	`${KAGED_HOME}/local/<user>.toml` (one file per operator)
Bind	`127.0.0.1:<random-free-port>` (or fixed in config)	`127.0.0.1:7777`
Auth mode	`loopback` (cookie-bound nonce; see ADR-0007 amendment)	`sidecar` (header contract with oauth2-proxy or equivalent)
systemd unit	`~/.config/systemd/user/kaged.service`	`/etc/systemd/system/kaged.service`
Plugin store	`${KAGED_HOME}/plugins/` (in this operator's home)	`${KAGED_HOME}/plugins/` (shared across all operators of this daemon)
Project registry	per-operator in local config	per-operator in local config (each operator has their own list)

The defaults are recommendations; every value is overridable in config or by env var. Mode just picks the starting point.

What's identical across modes

The HTTP+WS API surface (http-api.md).
The DSL format and validation rules (project-dsl.md).
The sandbox mechanism (sandbox.md).
The plugin protocol (plugin-host.md).
Every event in the audit log.
Every CLI subcommand.

The mode is a deployment-shape concern, not a behavior concern. Project authors and plugin authors do not need to know which mode their work will be loaded into.

Process model

The kaged daemon is a single long-lived process.

init system (systemd / launchd)
  └── kaged daemon (one process)
        ├── plugin: oh-my-pi    (supervised subprocess)
        ├── plugin: ollama      (supervised subprocess)
        ├── subagent: scraper   (supervised, in cage)
        ├── subagent: writer    (supervised, in cage)
        └── network gatekeeper  (in-process; not a separate child)

Rules:

The daemon does not fork itself. No worker pool, no master+worker, no preforked workers. Bun's concurrency is async; the daemon is single-process.
Every long-lived child is supervised. Plugins and subagents are subprocesses managed by named supervisors (PluginSupervisor, SubagentSupervisor) inside the daemon. If a child exits, the supervisor records it and (per restart policy) may respawn.
The daemon never execs into another binary. Replacements (upgrades) happen by stopping and starting a fresh process via the init system.
No daemon-internal IPC besides JSON-RPC-over-stdio with plugins. The daemon does not open Unix sockets to itself, does not run an internal HTTP loopback, does not use shared memory.

The "long-lived parent" promise from ADR-0001 is mechanical: the daemon has no parent in user space; the init system is its only parent.

Configuration

Sources, in precedence order

CLI flags (highest precedence)
Environment variables (KAGED_*)
Config file (at the mode-appropriate default path; see Configuration file)
Built-in defaults (lowest)

A value at a higher tier silently shadows lower tiers — the daemon does not error on overlap. The effective config is reported by kaged config show for inspection.

Configuration file

Lives at the mode-appropriate default path ($XDG_CONFIG_HOME/kaged/config.toml for user mode, /etc/kaged/config.toml for system mode). TOML for the daemon config (not YAML — the DSL is YAML; using a different format here avoids confusion about which file the operator is editing). TOML is also closer to the "configuration not data" feel of this file.

Auto-creation on first run. If no config file exists at the default path (and no --config flag or KAGED_CONFIG env is set), the daemon creates one with mode-appropriate defaults and logs Config created: <path>. This ensures the operator always has a config file to inspect and edit — no silent defaults.

# ${KAGED_HOME}/config.toml
# Daemon configuration. Reloaded only at restart.
# Generated by kaged on first run with mode-appropriate defaults.

[daemon]
bind = "127.0.0.1:7777"           # listen address
home = "/var/lib/kaged"            # may also be set via KAGED_HOME
public_url = ""                    # browser-reachable origin of THIS daemon; see below (ADR-0040)

[auth]
mode = "secure"                    # "secure" | "insecure"
nonce_file = "/var/lib/kaged/auth-nonce"   # secure mode only

[auth.sharedsso]
enabled          = true                       # default: false
issuer           = "https://sso.kaged.dev"    # no default; required when enabled
public_key       = """-----BEGIN PUBLIC KEY-----
...PEM (P-256)...
-----END PUBLIC KEY-----"""                   # optional; presence means daemon NEVER fetches JWKS
user_creation    = "disabled"                 # "enabled" | "disabled"; default "disabled"
pending_ttl_days = 7                          # reap window for TOFU pending rows

[storage]
url = "sqlite:///var/lib/kaged/kaged.db"   # or "postgres://user@host/db"

[sandbox]
mode = "enabled"                   # "enabled" | "disabled"
default_seccomp = "default"        # see ADR-0009

[logging]
operational = "stderr"             # "stderr" | "file:/path" | "journald"
audit = "file:/var/lib/kaged/audit.log"   # audit log is always file-backed
level = "info"                     # "debug" | "info" | "warn" | "error"

[plugins]
dir = "/var/lib/kaged/plugins"
enabled = ["oh-my-pi", "ollama"]   # subset of installed plugins

[ui]
serve = true                       # set false to disable the UI bundle
url = ""                           # base URL of the UI (for launch URLs); see below

The example above shows system-mode defaults. In user mode, auto-generated configs use ${KAGED_HOME}-relative paths (e.g., ~/.local/share/kaged/kaged.db, ~/.local/share/kaged/audit.log, ~/.local/share/kaged/plugins). Path fields left empty in the config file (or absent entirely) are filled at startup relative to the resolved daemon.home — they never fall back to hardcoded /var/lib/kaged paths.

First-run auto-creation: if no config file exists at the default path and no --config/KAGED_CONFIG override is set, the daemon creates a config file at the default path with mode-appropriate defaults and logs Config created: <path>. This ensures the operator always has an explicit, editable config file from first run — no silent defaults.

ui.url — The base URL where the web UI is reachable. Used to construct the launch URL printed at startup in loopback mode. When the UI runs on a separate process (e.g., a dev server or a tunnel), set this to the UI's origin (e.g., http://127.0.0.1:13001 or https://foo.bar.com). When empty (default), the daemon uses its own bind address — appropriate when the daemon serves the UI bundle itself (ui.serve = true). In the split topology (ui.url set) the daemon is in "split mode": it does not serve the SPA from /, it enables CORS for the ui.url origin, and it bootstraps the UI via the /connect redirect (see Root behavior and bootstrap redirect).

daemon.public_url (ADR-0040) — The browser-reachable origin of this daemon, e.g. https://daemon.example.com. Behind a tunnel or reverse proxy the daemon cannot reliably infer its public origin, so it is explicit; when unset, the daemon falls back to Host + X-Forwarded-Proto on the incoming request. It is used to construct (a) the /connect?api=${public_url}/api/v1 bootstrap redirect target and (b) the api= parameter in the printed loopback launch URL. Startup self-check: if ui.url is set and the bind is non-loopback, public_url is required — the daemon refuses to start otherwise (paralleling the sidecar bind-safety self-check), because a tunneled split deployment cannot advertise a correct registry base without it.

Root behavior and bootstrap redirect (ADR-0040)

Co-located (ui.url empty): an unauthenticated GET / with Accept: text/html serves the SPA bundle as today (ui.serve = true). Unchanged.
Split mode (ui.url set): an unauthenticated GET / with Accept: text/html returns 302 to ${ui.url}/connect?api=${public_url}/api/v1. The daemon never serves the SPA from / in split mode (asserting invariant). Non-HTML requests to / are unaffected (the API surface lives under /api/v1).
The printed loopback launch URL changes from ${ui.url}/launch?token=… to ${ui.url}/connect?api=${public_url}/api/v1&token=…, so one URL both registers the daemon in the UI's registry and authenticates it. The token is the single-use launch token (immediately rotated on use); the bearer session_token is returned in the /launch response body, never in the URL.
CORS for the ui.url origin is described in http-api.md § Cross-origin (CORS) and advertised base.

The file is parsed at startup. There is no hot-reload. To change config, edit, then systemctl restart kaged (or equivalent).

Shared SSO Configuration

The [auth.sharedsso] block configures the unified user identity and shared SSO bootstrap path (see ADR-0036).

[auth.sharedsso]
enabled          = true                       # default: false
issuer           = "https://sso.kaged.dev"    # no default; required when enabled
public_key       = """-----BEGIN PUBLIC KEY-----
...PEM (P-256)...
-----END PUBLIC KEY-----"""                   # optional; presence means daemon NEVER fetches JWKS
user_creation    = "disabled"                 # "enabled" | "disabled"; default "disabled"
pending_ttl_days = 7                          # reap window for TOFU pending rows

Bootstrap Orthogonality: Shared SSO is a session bootstrap mechanism, orthogonal to auth.mode. It can be enabled alongside loopback or sidecar modes.
Startup Validation: If enabled is true, the issuer field is required. Omitting it is a startup error. A provided public_key must parse as a valid P-256 SPKI PEM; otherwise, the daemon fails to start.
Zero-Contact (Pinned Key): When public_key is set, the daemon never fetches the JWKS from the issuer's endpoint, eliminating outbound network calls to the relay (see sso-relay.md).
TOFU Lifecycle: Unknown subjects are provisioned TOFU-style as pending rows in the users table (when user_creation = "enabled") and must be activated by an operator (see users.md).
Insecure Mode Interaction: Running the daemon with --insecure (or auth.mode = "insecure") only waives the ambient operator check for requests without a session. Any request carrying a valid kaged_user_session cookie still resolves as that specific user with their assigned role, and credentials (passwords, SSO tokens) are still verified at login.

Environment variables

Every config field has a corresponding env var. Convention: KAGED_<SECTION>_<KEY>, upper-snake-case.

Env var	Config path	Example
`KAGED_HOME`	`daemon.home`	`/var/lib/kaged`
`KAGED_BIND`	`daemon.bind`	`127.0.0.1:7777`
`KAGED_AUTH_MODE`	`auth.mode`	`secure`
`KAGED_INSECURE`	shorthand for `auth.mode=insecure`	`1`
`KAGED_AUTH_NONCE`	sidecar nonce direct (overrides `nonce_file`)	`<random>`
`KAGED_DATABASE_URL`	`storage.url`	`sqlite:///path`
`KAGED_SANDBOX_MODE`	`sandbox.mode`	`enabled`
`KAGED_NO_SANDBOX`	shorthand for `sandbox.mode=disabled`	`1`
`KAGED_LOG_LEVEL`	`logging.level`	`info`
`KAGED_PLUGINS_DIR`	`plugins.dir`	`/var/lib/kaged/plugins`
`KAGED_UI_URL`	`ui.url`	`http://127.0.0.1:13001`
`KAGED_PUBLIC_URL`	`daemon.public_url`	`https://daemon.example.com`
`KAGED_AUTH_SHAREDSSO_ENABLED`	`auth.sharedsso.enabled`	`true`
`KAGED_AUTH_SHAREDSSO_ISSUER`	`auth.sharedsso.issuer`	`https://sso.kaged.dev`
`KAGED_AUTH_SHAREDSSO_PUBLIC_KEY`	`auth.sharedsso.public_key`	`-----BEGIN PUBLIC KEY-----...`
`KAGED_AUTH_SHAREDSSO_USER_CREATION`	`auth.sharedsso.user_creation`	`disabled`
`KAGED_AUTH_SHAREDSSO_PENDING_TTL_DAYS`	`auth.sharedsso.pending_ttl_days`	`7`
`KAGED_NATIVES_PATH`	— (runtime override)	`/opt/kaged/kaged-natives.linux-x64-gnu.node`

KAGED_NATIVES_PATH is not a config-file field — it is a pure runtime override for the @kaged/natives .node binary path. When set, the natives loader uses this path directly. When unset, the loader falls back to resolving the .node beside process.execPath (suitable for compiled-binary distribution where the .node is placed next to the executable). Inside node_modules (development), the existing createRequire(import.meta.url) resolution still works. See ADR-0041 §4 for the rationale.

Env vars matching KAGED_* that don't correspond to a known config path are logged as a warning at startup but do not error. Typos surface visibly; forward-compat env vars don't crash old daemons.

CLI flags

CLI flags mirror env vars and take final precedence. Documented per command in CLI surface.

Bun's `.env` loading

ADR-0004 notes Bun auto-loads .env. For the daemon, this means a .env file in the working directory at startup is read into the process environment before the config layering above runs. This is convenient for development; the production deployment uses systemd EnvironmentFile= instead.

Operational config vs local config

The daemon has two config files with distinct purposes:

File	Owns	Scope
`config.toml` (operational)	Bind address, storage URL, sandbox mode, log destinations, plugin directory	The daemon as a process
`local.toml` (local config)	Model aliases, provider credentials, project registry, operator preferences	The operator

This section is about config.toml. Local config has its own spec: local-config.md.

Loading semantics differ. config.toml is read once at startup and frozen for the daemon's lifetime (changes require restart). Local config is read per request, per operator, cached in memory for the active sessions of that operator, and flushed on SIGHUP. In a per-user deployment they collapse to "this one operator's two files." In a system-wide deployment, every operator has their own local.toml while the daemon shares one config.toml.

Filesystem layout

The same layout applies in both modes; only ${KAGED_HOME}'s default path differs (see Deployment mode).

${KAGED_HOME}/
├── kaged.db                       # SQLite database (default storage)
├── kaged.db-wal                   # SQLite WAL file
├── kaged.db-shm                   # SQLite shared-memory file
├── audit.log                      # audit log (append-only, rotates)
├── local/                         # system-mode only: per-operator local configs
│   ├── operator.toml                #   one file per operator who has used this daemon
│   └── bob.toml                   #   in user-mode, local config lives at $XDG_CONFIG_HOME/kaged/local.toml
├── plugins/                       # local plugin store (installed plugins)
│   ├── oh-my-pi/
│   │   ├── kaged-plugin.yaml
│   │   └── run.sh
│   └── ollama/
│       ├── kaged-plugin.yaml
│       └── main.py
├── runtime/                       # ephemeral runtime state
│   ├── cages/                     # cage scratch dirs (one per live invocation)
│   ├── pids/                      # supervisor PID files
│   └── socks/                     # reserved for future use; empty in v0
└── tmp/                           # daemon-managed scratch; cleaned on start

The operational config (config.toml) and, in user mode, the local config (local.toml) live in $XDG_CONFIG_HOME/kaged/ — NOT inside ${KAGED_HOME}. This separates state (data) from config (operator preferences) per XDG conventions.

System-mode equivalents:

/etc/kaged/                         # config
├── config.toml                     # operational daemon config

/var/lib/kaged/                     # state (= ${KAGED_HOME})
├── kaged.db
├── audit.log
├── auth-nonce                      # sidecar-mode shared secret (mode 0600)
├── launch-url                      # current launch URL (mode 0600); updated on token regeneration
├── local/                          # per-operator local configs (one file per operator)
│   ├── operator.toml
│   └── bob.toml
├── plugins/
├── runtime/
└── tmp/

User-mode equivalents:

~/.config/kaged/                   # = $XDG_CONFIG_HOME/kaged
├── config.toml                    # operational daemon config (optional in user mode)
└── local.toml                     # this operator's local config

~/.local/share/kaged/              # = $XDG_DATA_HOME/kaged = ${KAGED_HOME}
├── kaged.db
├── audit.log
├── plugins/
├── runtime/
└── tmp/

$XDG_RUNTIME_DIR/kaged/            # ephemeral; cleared on logout
├── auth-cookie                    # per-startup nonce for loopback auth (mode 0600)
└── launch-url                     # current launch URL (mode 0600); updated on token regeneration

Projects do NOT live under ${KAGED_HOME}. Per ADR-0011, projects are operator-owned directories anywhere on the operator's filesystem. The daemon tracks which projects this operator has opened via the project registry in local config (local-config.md). Each project directory contains .kaged/project.yaml and any prompts and project-scoped data the project needs.

Rules:

${KAGED_HOME} is daemon-owned state. The operator may inspect, back up, and clean it, but the operator does not author files inside it directly (except the daemon config.toml if that's the chosen location for it).
runtime/ is ephemeral. Cleaned by the daemon at startup. Operators should not write to it.
auth-nonce (system mode) and auth-cookie (user mode) are mode 0600. The daemon refuses to start if it finds them world-readable (and --insecure is not set). This is a real check, not just convention.
launch-url is mode 0600. Written by the daemon at startup and on every token regeneration. Contains the full launch URL. CLI commands (kaged auth open) read this file directly — no API call required. The file is ephemeral; it is deleted on daemon shutdown and cleared on logout (user mode, via $XDG_RUNTIME_DIR).
The database can live elsewhere. Setting storage.url to a Postgres URL or a SQLite path outside ${KAGED_HOME} is supported; the layout above is the default.

Lifecycle

The daemon's life is divided into five phases. Each phase has explicit entry conditions, observable signals, and a defined failure mode.

Phase 1 — `bootstrapping`

From process exec to "config loaded, logger working."

Parse CLI flags.
Load env vars.
Resolve deployment mode (from flags/env only — config file not yet loaded).
Discover the config file at the mode-appropriate default path. If no config file exists, create one at the default path with mode-appropriate defaults and log Config created: <path>.
Load config file. Merge per precedence.
Resolve ${KAGED_HOME} (flags > env > config > defaultHome(mode)).
Fill empty path defaults (storage.url, logging.audit, plugins.dir) relative to ${KAGED_HOME}.
Initialize the operational logger.
Emit daemon.bootstrap event to stderr (and audit log once writable).
Print effective mode to stderr: kaged 0.1.0 starting | auth=secure | sandbox=enabled | bind=127.0.0.1:7777.

Failure mode: any error here goes to stderr and exits non-zero. The daemon has not yet bound a port, has not yet opened the database. Restart is safe.

Phase 2 — `self_check`

Security and integrity gates before opening anything.

In order:

Auth gate (per ADR-0007 amendment):
- If auth.mode == "secure":
  - Verify nonce_file exists and is mode 0600 owned by the daemon user. If not: refuse to start with a clear error pointing at the file path and the chmod command.
  - Read the nonce into memory. Never log it. Never persist it back.
- If auth.mode == "insecure":
  - Emit the multi-line CLI warning block to stderr.
  - Emit audit event auth.insecure_mode with bind address.
  - Do not check the nonce file.
- If auth.sharedsso.enabled == true:
  - Verify auth.sharedsso.issuer is present. If not, refuse to start.
  - If auth.sharedsso.public_key is present, verify it parses as a valid P-256 SPKI PEM. The daemon refuses to start if the key is invalid.
Bind-safety gate:
- If bind is non-loopback (anything other than 127.0.0.1, ::1, or a Unix socket path) AND auth.mode == "secure" AND KAGED_INSECURE_BIND != "1":
  - Refuse to start. The operator must either bind loopback (and front with the sidecar) or set KAGED_INSECURE_BIND=1 to acknowledge the risk.
- If bind is non-loopback AND auth.mode == "insecure":
  - Emit the "doubly-loud" combined warning (per ADR-0007 amendment). Continue.
Advertised-base sub-gate (ADR-0040): if ui.url is set (split mode) AND bind is non-loopback AND daemon.public_url is unset, refuse to start with a message naming daemon.public_url / KAGED_PUBLIC_URL. A tunneled split deployment cannot advertise a correct /connect?api=… base from Host/X-Forwarded-Proto alone, so the public origin must be explicit. (Loopback split-mode dev — ui.url set, loopback bind — does not require public_url; the bind address is a usable fallback.) This sub-gate shares the bind gate's exit code (11).
Sandbox gate (per ADR-0009 amendment and ADR-0041):
- If sandbox.mode == "enabled":
  - Check that bwrap is on PATH and is a recent-enough version. If not: refuse to start with a message naming the package to install.
  - Check kernel-version baseline for user namespaces (5.10+). If not: refuse to start.
  - bwrap/userns availability probe (ADR-0041 §7): attempt a trivial bwrap --unshare-user --unshare-pid --ro-bind / / -- /bin/true execution. If it fails (e.g., inside a container/pod where unprivileged user namespaces are unavailable), the daemon does not refuse to start but transitions to degraded sandbox posture:
    - The daemon sets an internal sandbox.degraded = true flag.
    - The startup banner includes sandbox=degraded (bwrap unavailable).
    - daemon.warnings (returned by getMe) includes sandbox_degraded.
    - An audit event sandbox.degraded is emitted with { reason: "bwrap_probe_failed", error: "<message>" }.
    - ⛔ Invariant: the daemon never silently falls to uncaged execution. When degraded, subagent spawns with cage: enabled are refused unless the operator has explicitly set --no-sandbox or KAGED_NO_SANDBOX=1. The degraded state means "bwrap is broken, you must acknowledge uncaged execution explicitly." This prevents the container/pod scenario where sandbox silently becomes a no-op.
- If sandbox.mode == "disabled":
  - Emit the no-sandbox CLI warning block.
  - Emit audit event sandbox.disabled.
  - Skip the bwrap/kernel checks.
Storage gate:
- For SQLite: ensure the parent directory of the db path exists and is writable. Open in WAL mode. Run pending schema migrations. Refuse to start on migration failure with the migration ID and error.
- For Postgres: connect, version-check, run migrations. Same refusal semantics.
Plugins gate:
- Walk plugins.dir. For each installed plugin, validate its kaged-plugin.yaml manifest. Plugins with invalid manifests are logged and disabled for this daemon run (operator sees them in kaged plugin list); they do not block daemon startup.
- Plugin processes are not started yet — that happens in running.
Filesystem gate:
- Clean runtime/. Create subdirs if missing.

If every gate passes, transition to running. Otherwise, the daemon exits with a clear error and a non-zero exit code mapped to the gate that failed (auth=10, bind=11, sandbox=12, storage=13, plugins=14, filesystem=15). Exit codes are stable; ops tooling may key off them.

Phase 3 — `running`

The daemon is doing its job.

Entry actions:

Open the HTTP+WS listener on bind.
Write runtime state files (all mode 0600, created in the mode-appropriate runtime directory — $XDG_RUNTIME_DIR/kaged/ in user mode, ${KAGED_HOME}/ in system mode):
- auth-cookie (user/loopback mode only): the per-session nonce from which the session cookie is derived. Generated once at daemon start; does not change when launch tokens are regenerated. Per ADR-0007 amendment. Log Nonce written: <path> to stderr.
- launch-url (loopback mode only): the current launch URL ({ui_base_url}/launch?token=<token>). Rewritten whenever the launch token is consumed and regenerated. Also printed to stderr at startup and on each regeneration. The directory is created with mode 0700 if it does not exist. The daemon refuses to start if the directory exists but is not owned by the daemon user.
Mark /readyz ready (per http-api.md).
Spawn each enabled plugin. Failed spawns log and disable the plugin; do not bring down the daemon.
Walk projects/ and load any existing projects. Validate their DSLs; flag invalid ones (visible in API as dsl_status: invalid). Do not auto-start sessions.
Emit audit event daemon.ready.

In this phase:

HTTP requests are served.
WebSocket connections are accepted.
Subagents are spawned on demand by the supervisor.
Plugins are running and reachable.
The audit log is being written.

Phase 4 — `draining`

Triggered by SIGTERM. Graceful shutdown.

Mark /readyz not-ready (returns 503). The HTTP listener is still bound — load balancers stop sending new traffic.
Reject new WebSocket upgrades with 503.
Send closing { code: "server_shutdown" } to every connected WebSocket, then close after a 1-second flush window.
Send a "shutdown soon" notice to every live subagent. Wait shutdown_grace_sec (default 10) for them to finish.
SIGTERM any subagents still running. Wait shutdown_kill_sec (default 5).
SIGKILL any survivors.
Send shutdown JSON-RPC notification to every plugin. Wait shutdown_grace_sec for them to exit.
SIGTERM, then SIGKILL remaining plugins.
Close the storage connection (SQLite checkpoints WAL; Postgres releases the connection pool).
Emit audit event daemon.shutdown with reason.

If the daemon receives a second SIGTERM during draining: skip to step 6 (fast shutdown). If it receives SIGKILL: the kernel handles it; the daemon does no cleanup. This is expected to be recoverable — the next startup runs migrations and processes any in-progress runs as failed.

Phase 5 — `stopped`

Process exit. The init system observes the exit and decides whether to restart per its policy.

Subsystem dependency order

The daemon's subsystems start in a specific order during self_check → running:

logger
  ↓
config (loaded, validated)
  ↓
audit log writer (so subsequent events are captured)
  ↓
storage (db connection, migrations)
  ↓
network gatekeeper (in-process; sets up nftables rule templates)
  ↓
subagent supervisor (binds to storage; no children yet)
  ↓
plugin host + plugin supervisor (spawns initial plugin processes)
  ↓
session manager (binds to storage; no sessions active yet)
  ↓
HTTP+WS listener (last; opens the door)

Shutdown reverses the order. The HTTP listener stops accepting new connections first, then plugins, then subagents, then everything else, with the audit log writer last so it captures the shutdown of every other subsystem.

The reason this matters: the daemon never accepts a request it cannot service. If the storage layer isn't up, the listener isn't open.

CLI surface

The kaged binary has two modes:

Daemon mode: kaged start ... runs the long-lived process.
Client mode: every other subcommand makes a local call to the running daemon (via its HTTP API, talking to 127.0.0.1 or the configured loopback bind).

The CLI is plumbing, not a workflow surface (per ADR-0002). Operators use it to start/stop the daemon, inspect state, manage plugins and DSL files, and emit the auth nonce. They do not use it for project work; that's the web UI.

`kaged start`

Run the daemon in the foreground.

kaged start [flags]

  --config <path>             Path to config.toml (default: ${KAGED_HOME}/config.toml)
  --home <path>               Override ${KAGED_HOME}
  --bind <addr>               Override the listen address
  --insecure                  Bypass auth (per ADR-0007). LOUD WARNINGS.
  --no-sandbox                Disable sandboxing (per ADR-0009). LOUD WARNINGS.
  --insecure-bind             Allow non-loopback bind in secure mode. Required if not --insecure.
  --log-level <level>         debug | info | warn | error
  --foreground                Stay in foreground (default; here for documentation)

systemd unit files invoke kaged start with no --foreground quirk — the daemon is already foreground-only.

`kaged status`

Print daemon status: version, mode, bind, uptime, project/session counts, plugin status.

kaged status
  daemon: kaged 0.1.0 (pid 12345, up 3h 42m)
  bind:   127.0.0.1:7777
  auth:   secure (sidecar nonce loaded)
  sandbox: enabled (bwrap 0.8.0)
  storage: sqlite:///var/lib/kaged/kaged.db (ok)
  projects: 4 (3 valid, 1 invalid)
  plugins: 2 enabled (oh-my-pi: running, ollama: running)
  warnings: none

In insecure modes, the warnings line is populated and printed in magenta-equivalent terminal color.

`kaged config show`

Print the effective merged config (after all sources). Useful for debugging precedence.

kaged config show [--source]

--source annotates each value with where it came from (flag, env, file, default).

`kaged config validate`

Parse the config file and report errors without starting the daemon.

`kaged auth nonce`

Print the current sidecar nonce to stdout. Used by sidecar configuration tooling.

kaged auth nonce
  <printed to stdout — no trailing newline interpretation required>

Reads from nonce_file (or env). In --insecure mode, prints nothing and exits non-zero with a message that no nonce exists. Reading this requires read access to the nonce file; operators run it as the daemon user.

`kaged auth rotate`

Generate a new nonce, write it to nonce_file, and signal the running daemon to reload it.

kaged auth rotate
  ✓ new nonce written to /var/lib/kaged/auth-nonce
  ✓ daemon reloaded (SIGHUP)
  → reconfigure your sidecar with the new value

SIGHUP is the daemon's "reload nonce only" signal. Nothing else is reloaded by SIGHUP; the rest of config requires a restart.

`kaged auth open`

Open the current launch URL in the operator's default browser. Used to authenticate a new browser session without copy-pasting the URL from daemon logs.

kaged auth open
  → opening http://127.0.0.1:38291/launch?token=abc123...

Reads the launch URL from the runtime state file ($XDG_RUNTIME_DIR/kaged/launch-url in user mode, ${KAGED_HOME}/launch-url in system mode). Calls xdg-open (Linux) or open (macOS) with the URL. No daemon API call is made — this is a pure file read + subprocess spawn.

Failure modes:

No launch-url file exists → exit non-zero with No running daemon found (missing launch-url file).
--insecure mode → exit non-zero with No launch URL in insecure mode (auth is disabled).
xdg-open / open not found → exit non-zero with Could not open browser: xdg-open not found.

The command does not consume the launch token — it merely opens the URL. The browser visit consumes it. Existing browser sessions are unaffected by token regeneration; the session cookie remains valid.

`kaged plugin list / install / enable / disable / logs`

Per ADR-0008:

kaged plugin list — table of installed plugins, status, last error.
kaged plugin install <path> — copy a plugin directory into plugins.dir, validate the manifest. Does not auto-enable.
kaged plugin enable <name> / disable <name> — toggle in config (writes to config.toml) and signals daemon.
kaged plugin logs <name> — tail stderr for the named plugin.

`kaged dsl validate / migrate / schema`

Per project-dsl.md CLI surface:

kaged dsl validate <path> — parse and validate a DSL file. Exits non-zero on failure.
kaged dsl migrate <path> --to <version> — schema-version migration.
kaged dsl schema [--version N] — print published JSON Schema.

These commands work without a running daemon (they're pure file operations); they don't make HTTP calls.

`kaged backup` and `kaged restore`

Per ADR-0005:

kaged backup [--output <path>] — produce a backup of the database, prompts, and projects. For SQLite, runs .dump against a consistent snapshot. For Postgres, runs pg_dump.
kaged restore <path> — restore from a backup. Refuses to run with a daemon active; the operator must stop the daemon first.

`kaged audit`

Tail or query the audit log directly (without going through the HTTP API).

kaged audit tail                   # follow
kaged audit query --since 1d       # last 24h
kaged audit query --event-type 'subagent.spawn.uncaged'

`kaged version`

Print version. The most boring command; included because every CLI needs it.

`kaged help`

Top-level help. Subcommand help via kaged <cmd> --help.

Logging

The daemon writes two logically distinct streams. Per ADR-0007 and the manifesto, the audit log is load-bearing for operator trust; the operational log is for debugging.

Operational log

Free-form structured logs. Default destination: stderr. Configurable to a file or journald.

Format: newline-delimited JSON ({"ts":..., "level":..., "msg":..., ...fields}) when destined for a file or journald; human-friendly text when destined for stderr in a TTY.
Levels: debug, info, warn, error. Default: info.
Contents: request lines, plugin spawn/exit, supervisor decisions, LLM-provider errors, daemon lifecycle events.
Not for audit. The operational log may be discarded, rotated by external tools, or sent to a remote collector. Nothing here is considered a record-of-truth.

Audit log

Append-only record of every load-bearing event.

Destination: file (audit.log in ${KAGED_HOME} by default). The audit log is always file-backed — never stderr only — because losing it on a process crash is unacceptable.
Format: newline-delimited JSON, one event per line. Field schema documented in http-api.md audit endpoint.
Append-only. The daemon never rewrites or deletes audit entries. Log rotation, if configured, archives old files but they remain readable.
fsync policy: the daemon fsyncs the audit log after every write. Slow but correct. Operators who want batched fsync can set logging.audit_sync = "interval:1s" in config (not recommended).
Event taxonomy (initial; extensible):
- daemon.bootstrap, daemon.ready, daemon.shutdown, daemon.crash
- auth.success, auth.failure, auth.insecure_mode, auth.nonce_rotated
- sandbox.disabled
- project.created, project.dsl_updated, project.deleted
- session.created, session.attached, session.detached, session.ended
- run.started, run.ended, run.cancelled
- subagent.spawn, subagent.spawn.uncaged, subagent.exit, subagent.killed
- checkpoint.created, checkpoint.resumed, checkpoint.rollback
- prompt.edit
- plugin.spawned, plugin.exit, plugin.crashed, plugin.enabled, plugin.disabled
- policy.violation (any time a cage limit is hit or a request is denied)

Every audit event carries request_id when applicable, the operator's user_id (or insecure-mode), and a millisecond timestamp.

Supervisor behavior

The daemon hosts named supervisors. Each owns a class of children.

`PluginSupervisor`

Owns plugin subprocesses (per ADR-0008).

Spawn: at daemon-ready, walks the enabled-plugins list, spawns each.
Restart policy: exponential backoff (1s, 2s, 4s, 8s, capped at 60s) on crash. After 5 consecutive failures within 10 minutes, the plugin is marked failed and disabled; operator re-enables via kaged plugin enable.
Health: the supervisor sends ping JSON-RPC every 30s. No response in 90s → kill and restart.
Shutdown: sends shutdown JSON-RPC; SIGTERM after shutdown_grace_sec; SIGKILL after shutdown_kill_sec.

`SubagentSupervisor`

Owns subagent invocations (per ADR-0009).

Spawn: on demand from the session manager. Compiles the cage policy from the DSL, sets up the network namespace and gatekeeper rules, then bwraps the subagent process. If cage: disabled or --no-sandbox, spawns as the daemon user directly.
No automatic restart. A failed subagent stays failed. The next operator message can retry.
Resource enforcement: cgroup limits applied at spawn; on limit breach, the supervisor kills and emits policy.violation.
Walltime: the supervisor enforces the walltime_sec from the DSL with its own timer. On expiry: SIGTERM, then SIGKILL after 5s.
Shutdown: during daemon draining, the supervisor SIGTERMs every live subagent and waits. Subagents that ignore the signal are SIGKILLed.

`SessionSupervisor`

Owns session lifecycle. Detailed semantics live in session-manager.md. At daemon level:

Sessions survive operator disconnects (per ADR-0002).
The daemon persists session state continuously (not just on shutdown). A SIGKILL of the daemon loses no committed work — any uncommitted in-flight reasoning is marked as a failed run on next startup.

Project loading

Per ADR-0011, projects are operator-owned directories tracked through the project registry in local config (local-config.md). The daemon does NOT discover projects by scanning the filesystem; it knows about a project only after the operator has explicitly loaded it.

What "loading a project" means

When the operator invokes POST /api/v1/projects/load (or kaged project load <path>):

The daemon reads <path>/.kaged/project.yaml. If absent or unreadable → dsl_invalid error with details.reason: "no_project_yaml".
The daemon validates the DSL (project-dsl.md). On failure → dsl_invalid with line/col details.
The daemon resolves the calling operator's local config (per local-config.md).
The daemon collects:
- Every alias referenced by primary.model and subagents.<name>.model.
- Every plugin in plugins.
- Every prompt file referenced by *.system_prompt.
- Every path referenced by cage.fs[].path.
The daemon checks each against the operator's local config and the project directory:
- Alias is in [aliases]? If not → pending, add to unresolved-aliases list.
- Plugin registry entry's package is registered in the daemon's project-plugin supervisor? If not → pending, add the slot name plus { package, source, status: "missing" } to the missing-plugins list.
- Prompt file exists at <project-root>/<path>? If not → pending, add to missing-prompts list.
The daemon writes the project to the registry (or updates the existing entry) with state ready, pending, or invalid.
The daemon returns the project status and the lists of unresolved items.

The UI then walks the operator through resolution: defining missing aliases, installing missing plugins (with the install prompt per ADR-0008 amendment), or asking the operator to fix missing prompts on disk.

Re-evaluation triggers

A project's state is recomputed when:

The operator edits local config (alias added, plugin installed, etc.) — the daemon recomputes the state of every registered project that mentioned the affected name.
The operator edits the project's DSL on disk (detected by mtime check on the next API call that references the project, or explicitly via POST /api/v1/projects/:slug/dsl per http-api.md).
The daemon restarts. Every registered project is re-evaluated as part of the running phase entry.

State changes emit project.state_change audit events with the old and new state.

Hot-reload

DSL edits applied to a registered project hot-reload at the next session-start, not immediately. Active sessions continue with the DSL they were started under. The UI shows a "this project's DSL changed; restart sessions to pick up the change" indicator.

Hot-reload is intentionally conservative — a subagent mid-task should not have its cage policy change underneath it. Operators wanting immediate apply use the session UI's "restart with new DSL" action.

Loaded vs unloaded projects

A project on disk is a project; a project that has been kaged project load-ed is a registered project for this operator. Sessions can only start against registered projects. The same project directory can be registered by multiple operators on a shared system-wide daemon (each gets their own entry in their own local config; the underlying directory is shared but their alias resolutions and state are per-operator).

Unloading

DELETE /api/v1/projects/:slug (per http-api.md) removes the project from this operator's registry and ends any active sessions for it. It does not delete files on disk. The operator can re-load it later.

A kaged project forget <id> CLI shorthand is equivalent.

Crash semantics

When a plugin crashes

The plugin host detects EOF on stdin/stdout.
PluginSupervisor records plugin.crashed with the exit code and last stderr.
Restart per backoff policy.
API calls that were mid-flight to the plugin return 502 provider_unreachable.
Other plugins are unaffected.
The daemon stays up.

When a subagent crashes

The supervisor detects the process exit.
The relevant run is marked failed. WS subscribers see subagent.end with non-zero exit.
The cage is torn down (network namespace cleaned, scratch dir wiped if ephemeral).
The primary may attempt to handle the failure per its prompt; the daemon does not auto-retry.
Other subagents and the daemon are unaffected.

When the daemon itself crashes

The init system (systemd) decides whether to restart. The blessed unit file sets Restart=on-failure.
On restart, the daemon runs self_check again. Existing data on disk is consistent (SQLite WAL guarantees atomicity for committed transactions).
Any subagents and plugins that were alive at crash time are orphaned. The next self_check cleans runtime/, which includes their PID files; the daemon does NOT kill orphaned processes on startup (they may have detached for legitimate reasons, like a deploy step the operator wants to outlive the daemon). Operators see them as "untracked processes" in kaged status and can clean them manually.
The audit log records daemon.crash (from the recovering instance, not the crashing one — the crashing instance is by definition unable to write its own crash event reliably).

When the host loses power

WAL replay on next SQLite open recovers committed transactions.
The audit log is fsync'd; the last committed line survives.
The daemon comes up via systemd; runs self_check; resumes.

systemd units

Two unit files ship as documentation in the kaged-releases repo. Operators install the one matching their deployment mode (per ADR-0010).

System-wide unit (`/etc/systemd/system/kaged.service`)

systemd/kaged.service from kaged-releases:

[Unit]
Description=kaged daemon
Documentation=https://kaged.dev
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=kaged
Group=kaged
EnvironmentFile=-/etc/kaged/env
ExecStart=/usr/local/bin/kaged start
Restart=on-failure
RestartSec=5s
KillMode=mixed
TimeoutStopSec=30s

# Hardening (systemd-side, complementing kaged's own sandbox)
NoNewPrivileges=yes
ProtectSystem=strict
ReadWritePaths=/var/lib/kaged
ProtectHome=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=no                 # the daemon needs cgroup access for subagent limits
RestrictNamespaces=user pid net mount   # the daemon needs these to set up cages

[Install]
WantedBy=multi-user.target

Notes:

User=kaged runs the daemon as a non-root user. Sandbox features (user namespaces) work fine; the operator does not run kaged as root.
ReadWritePaths=/var/lib/kaged matches ${KAGED_HOME}.
RestrictNamespaces is permissive enough that the daemon can create namespaces for cages. Tightening this further breaks bwrap.
The OAuth sidecar (oauth2-proxy or equivalent) ships as a separate unit file (oauth2-proxy@kaged.service), out of scope for this spec.

launchd (macOS) and OpenRC (Alpine) equivalents will land in v0.x as documentation, not v0.

Per-user unit (`~/.config/systemd/user/kaged.service`)

systemd/kaged-user.service from kaged-releases:

[Unit]
Description=kaged daemon (per-user)
Documentation=https://kaged.dev
After=default.target

[Service]
Type=simple
EnvironmentFile=-%h/.config/kaged/env
ExecStart=%h/.local/bin/kaged start
Restart=on-failure
RestartSec=5s
KillMode=mixed
TimeoutStopSec=30s

# No system-level hardening directives are needed (the user's namespace
# already constrains the process). bwrap inside kaged handles cage isolation.

[Install]
WantedBy=default.target

Install/enable:

mkdir -p ~/.config/systemd/user
curl -fsSL https://raw.githubusercontent.com/kaged-dev/kaged-releases/main/systemd/kaged-user.service \
  -o ~/.config/systemd/user/kaged.service
systemctl --user daemon-reload
systemctl --user enable --now kaged
# To make kaged run when the operator is logged out:
loginctl enable-linger "$USER"

Notes:

The per-user unit runs as the operator's UID. No User= or Group= directive; systemd handles it via the --user instance.
%h is systemd's expansion for the user's home directory.
Hardening directives (ProtectSystem, ReadWritePaths, etc.) are intentionally omitted from the user unit. The kernel's user-namespace boundary plus bwrap inside kaged is the isolation; replicating systemd-side hardening adds friction for limited benefit when the daemon is already constrained by being non-root.
loginctl enable-linger is required for the daemon to run when the operator is logged out (otherwise systemd kills user sessions on logout). Documented; not hidden.
No sidecar is installed in this deployment. The daemon uses loopback + cookie-bound nonce auth (per ADR-0007 amendment).

Mixed deployments

An operator running a per-user kaged on their workstation AND interacting with a system-wide kaged on a homelab box is supported — they're independent daemons reachable at different addresses. The CLI can target a specific daemon via KAGED_BIND or --daemon; absent that, it talks to the per-user one if running, else system-wide. (Detailed CLI targeting rules are in CLI surface.)

Migrations

Database schema migrations are applied automatically during self_check.

Format: SQL files in packages/daemon/migrations/ named NNNN_description.sql, where NNNN is a zero-padded sequence number.
Engine portability: every migration has either one SQL file (when portable) or two (NNNN_description.sqlite.sql, NNNN_description.postgres.sql) per ADR-0005.
Tracking: a schema_migrations table records the applied migration IDs and timestamps.
Failure handling: on migration error, the daemon refuses to start and the migration's transaction is rolled back. The daemon does not partially apply migrations.

The daemon does not support down-migrations. To revert a schema, the operator restores from a backup made before the bad migration. This is documented as a feature, not a bug — automatic down-migrations are dangerous and we don't want operators to develop the reflex.

Resource budgets (v0 defaults)

Resource	Limit	Where enforced
Memory per subagent cage	256 MB	cgroups (configurable via DSL `cage.limits.memory_mb`)
Walltime per subagent	600s	supervisor timer
Concurrent subagents per session	8	session supervisor (rejects with 409 if exceeded)
Concurrent plugins	unlimited (within process FD limits)	none
Audit log file size	100 MB before rotation	logger
Operational log file size	50 MB before rotation	logger
WebSocket buffer per channel	per `http-api.md`	session manager
Database connections	1 (SQLite), pool of 10 (Postgres)	storage layer

These are operator-tunable in config.toml under their respective sections. The defaults are sized for a low-resource Linux host (e.g., 2GB RAM).

Testing notes

Per ADR-0003:

Self-check tests: each gate (auth, bind, sandbox, storage, plugins, filesystem) has at least one test asserting the exit code and stderr message.
Config precedence tests: every overlap between sources is exercised — flag overrides env, env overrides file, file overrides default.
Lifecycle tests: the daemon starts, reaches running, drains on SIGTERM cleanly. Forced SIGKILL leaves the database consistent.
Supervisor tests: plugin crash → backoff and restart. Subagent crash → cage cleanup. Resource limit breach → policy.violation audit event.
Migration tests: every migration has a forward test against a fixture database. Portability tests run the same migrations against SQLite and Postgres CI containers.
CLI tests: every subcommand exercises a happy path and a documented failure mode.
Audit log tests: every event type listed above is producible by an integration test. The fsync guarantee is validated by a power-loss simulator (kill -9 the daemon mid-write, restart, assert the last committed event is durable).

Open questions

Multi-tenant readiness. Today the daemon assumes one operator (or a trusted group authed through one sidecar). v2 will add per-operator scoping. The audit log already includes user_id; the rest is RBAC work.
Cluster mode. Cross-daemon mesh (a kaged at home talking to a kaged in the office) was sketched in ADR-0001 and the vision doc. v0 is single-daemon; this spec doesn't preclude clustering but doesn't enable it either.
Hot-reload of plugins. Today, enabling a plugin in config.toml requires a daemon restart. kaged plugin enable works without restart by signaling the daemon to spawn the plugin process; full config-driven hot-reload is deferred.
Resource autoscaling. No automatic memory/walltime adjustment per workload. Operators tune defaults globally and per-cage. Reasonable for v0.
macOS support. Daemon process model works fine on macOS (launchd, bsd kqueue). The blocker is the sandbox layer (bwrap is Linux). v0 is Linux-only; macOS is "kaged works in a Linux VM."

Amendments

2026-06-30: Natives loader wiring — tool handlers now use three-tier resolution (#47)

The three-tier .node resolution (KAGED_NATIVES_PATH → beside-binary → dev createRequire) was implemented in natives-resolve.ts per the 2026-06-14 amendment below, but the tool handlers (edit-handlers.ts, search-handlers.ts) bypassed it by calling loadNative() directly from @kaged/natives. The npm package's loader uses createRequire(import.meta.url), which resolves from Bun's virtual filesystem (/$bunfs/...) inside a compiled binary — where the .node does not exist. This caused [tool_not_found] errors for all native-backed tools (grep, glob, ast-search, ast-edit) in Docker/compiled-binary deployments.

Fix: native-loader.ts bridges resolveNativesPath() with actual .node loading. It resolves tiers 1–2 to a .node file path, then loads it via require() with the absolute path (bypassing virtual-FS resolution). If resolution fails or returns the non-.node JS entry point (tier 3 dev fallback), it delegates to the npm package's loadNative(). All tool handlers now import loadNativeBindings() from native-loader.ts instead of loadNative() from @kaged/natives.

2026-06-14: Containerised daemon — natives path resolution, sandbox degrade, container runtime knobs (ADR-0041)

Per ADR-0041:

KAGED_NATIVES_PATH env var added to the env var table. Runtime override for the @kaged/natives .node binary path — not a config-file field. When set, the natives loader uses this path directly; when unset, falls back to resolving beside process.execPath (compiled binary distribution), then to the existing createRequire(import.meta.url) resolution (development). This three-tier resolution is robust across bun build --compile and development modes.
Sandbox gate extended with bwrap availability probe. When sandbox.mode == "enabled", the self-check phase now probes actual bwrap functionality (a trivial bwrap --unshare-user ... -- /bin/true invocation). If bwrap is on PATH but cannot establish isolation (common inside containers/pods without unprivileged user namespaces), the daemon enters degraded sandbox posture: sandbox.degraded flag set, startup banner shows sandbox=degraded, daemon.warnings includes sandbox_degraded, audit event sandbox.degraded emitted. ⛔ Invariant: degraded sandbox refuses caged subagent spawns unless the operator explicitly opts into uncaged execution via --no-sandbox.
/healthz and /readyz probe contract documented. These endpoints are the Kubernetes liveness and readiness probe targets. /healthz returns 200 unconditionally (the process is alive). /readyz returns 200 when setReady(true) has been called (all gates passed, listener open), 503 otherwise and during draining. Both return application/json with { status: "ok" | "not_ready" }.

2026-06-14: Multi-daemon operator UI — advertised base, bootstrap redirect, split-mode self-check (ADR-0040)

Per ADR-0040:

daemon.public_url config (env KAGED_PUBLIC_URL) added to [daemon] — the browser-reachable origin of this daemon, used to build the /connect?api=… bootstrap redirect and the api= parameter in the printed loopback launch URL. Falls back to Host + X-Forwarded-Proto when unset.
Root behavior split. With ui.url empty (co-located) the daemon serves the SPA from / as before; with ui.url set (split mode) an unauthenticated GET / (Accept: text/html) 302s to ${ui.url}/connect?api=${public_url}/api/v1 and the daemon never serves the SPA from /.
Printed loopback launch URL changed to the ${ui.url}/connect?api=…&token=… form (was ${ui.url}/launch?token=…).
Self-check sub-gate added under the bind-safety gate: split mode (ui.url set) + non-loopback bind requires daemon.public_url, else refuse to start (exit code 11).
Bearer transport + CORS are specified in http-api.md (bearer identity path, CSRF exemption, per-origin CORS); the daemon's auth resolver gains the Authorization: Bearer path and getLaunch/postAuthSso return session_token.

2026-06-11: Unified user identity and shared SSO (Phase 0)

Per ADR-0036:

Shared SSO configuration. Added the [auth.sharedsso] configuration block to the daemon config, allowing the daemon to verify signed ES256 JWTs from a stateless SSO relay.
Startup validation. Added validation rules for auth.sharedsso during the self_check phase: issuer is required if enabled is true, and public_key must parse as a valid P-256 SPKI PEM.
Zero-contact mode. Documented that when public_key is pinned, the daemon never fetches JWKS from the issuer.
Insecure mode interaction. The --insecure flag only waives the ambient operator check; requests with a valid kaged_user_session still resolve as that user.

2026-06-25 — ADR-0049: antigravity-auth module migrates to provider plugin

Driven by ADR-0049 (Accepted 2026-06-25). The daemon-side packages/daemon/src/runtime/antigravity-auth/ module added by the 2026-05-30 amendment (below) is deleted when ADR-0049 implementation lands — its responsibilities migrate wholesale into the Antigravity provider plugin (per the provider-plugin contract in llm.md § Provider plugin contract). (Correction 2026-06-29: ADR-0049 was amended — the plugin is compiled into the daemon binary, not loaded in-process from $KAGED_HOME/providers; that store was retired. See ADR-0049 § Amendments 2026-06-29.)

runtime/antigravity-auth/ module deleted. The PKCE flow, token exchange, persistent token store, proactive/reactive refresh, and resolveCredentials() integration all migrate into the Antigravity plugin's auth module. The daemon no longer owns provider OAuth centrally; each provider plugin owns its own auth.
Token store location unchanged. $XDG_CONFIG_HOME/kaged/oauth/antigravity-tokens.json (renamed from antigravity-tokens.json to the per-provider convention <provider>-tokens.json) — Zod-validated, mode 0600, atomic writes. Schema unchanged.
HTTP endpoints unchanged. POST /login, GET /status, POST /logout under /api/v1/local/providers/antigravity/auth/ keep the same contract (see http-api.md § OAuth provider auth). The daemon delegates each request to the plugin's auth interface rather than handling it inline.
resolveCredentials() update. Previously checked the daemon-owned Antigravity token store; now calls into the Antigravity plugin's auth module via the provider-plugin contract. Resolution order unchanged: plugin token store (fresh) → local.toml static token → null (unresolved).
Implementation-phase note. Per the Antigravity plugin's reference guidance (llm.md § Antigravity as the reference plugin), the plugin should be adapted from the existing opencode-antigravity-auth plugin rather than reinvented.

Until the implementation phase lands, the existing runtime/antigravity-auth/ module remains the operative code path.

2026-05-30 — Antigravity provider auth module

Per ADR-0028:

New runtime module antigravity-auth/. The daemon gains an internal module at packages/daemon/src/runtime/antigravity-auth/ that owns the Antigravity provider's OAuth lifecycle: PKCE-based authorization code grant, token exchange, persistent token storage ($XDG_CONFIG_HOME/kaged/antigravity-tokens.json), proactive token refresh, and integration with the existing resolveCredentials() flow.
Credential resolution extension. resolveCredentials() in primary-runner.ts now checks the Antigravity token store before falling back to local.toml access_token fields. Resolution order: daemon token store (fresh) → local.toml static token → null (unresolved).
Three new HTTP endpoints. POST /login, GET /status, POST /logout under /api/v1/local/providers/antigravity/auth/. See http-api.md § Antigravity provider OAuth for the contract.
Token store. Zod-validated JSON at $XDG_CONFIG_HOME/kaged/antigravity-tokens.json, mode 0600. Contains refresh token, access token, expiry, email, and project ID. Atomic writes via temp file + rename.

2026-05-21 — Deployment modes + project-load flow + local-config split

Significant amendment driven by ADR-0010 and ADR-0011:

New "Deployment mode" section before "Process model." Defines per-user vs system-wide mode detection at startup, the mode-determined defaults table, and the "what's identical across modes" guarantee.
Filesystem layout split by mode. System mode still uses /var/lib/kaged for state and /etc/kaged/ for operational config; user mode uses XDG paths ($XDG_DATA_HOME/kaged for state, $XDG_CONFIG_HOME/kaged/ for config, $XDG_RUNTIME_DIR/kaged/ for the loopback auth cookie). Per-operator local configs added under ${KAGED_HOME}/local/ in system mode.
New "Project loading" section before "Crash semantics." Defines the project-load endpoint flow (validate → resolve aliases → check plugins → register), state re-evaluation triggers, hot-reload conservatism, and unloading.
Per-user systemd unit added alongside the existing system unit, with loginctl enable-linger documented for logged-out operation.
"Operational config vs local config" subsection added to Configuration to disambiguate the two files and their loading semantics.

The spec is also now constrained by ADR-0010 and ADR-0011 (added to the frontmatter).

2026-05-24 — Streaming-first: events channel publishing, abort controller registry

Per ADR-0016:

Events channel publishing from dispatchPrimary. The daemon now publishes lifecycle events on the events WebSocket channel — run.started when a run begins processing, run.ended (with outcome) when a run completes, fails, or is cancelled. These events trigger query cache invalidation in the UI so session state and message lists stay current without manual refresh. The ws-registry module gains a publishSessionEvent function alongside the existing publishHarnessEvent.
Abort controller registry. The daemon maintains a per-run AbortController registry (activeRuns map in primary-runner.ts). When dispatchPrimary starts a run, it registers the controller; when the run ends (any outcome), it deregisters. The existing POST /sessions/:id/runs/:rid/cancel endpoint looks up the controller and calls .abort(), propagating cancellation through the harness to the LLM provider's SSE stream. This gives operators immediate abort capability.
Message ordering. listMessages now orders by created_at ASC (previously id ASC). ULID lexicographic order and creation-time order can diverge when messages are created across async boundaries; created_at is the authoritative timeline.

2026-05-23 — Per-session model override dispatch

Model override in dispatch path. dispatchPrimary now checks session.model before alias resolution. When set, it splits the override ("provider:model") into provider name and model ID, resolves the provider's credentials from local config, and constructs the ProviderRoute directly — bypassing alias lookup entirely. When model is null, the existing alias resolution path is used unchanged.
Override persistence via handlePostMessage. When POST /sessions/:id/messages includes a model field, the session record is updated with the override before dispatch begins. This makes the override sticky — subsequent messages use it until changed or cleared.
Override persistence via handleUpdateSession. PUT /sessions/:id now accepts model alongside label. The operator (or UI) can set, change, or clear (null) the override without posting a message.

2026-05-22 — UI URL configuration + launch token regeneration

New ui.url config key added to the [ui] section. Specifies the base URL where the web UI is reachable, used to construct launch URLs in loopback mode. When the UI runs on a separate origin (dev server, tunnel, reverse proxy), the operator sets this to the UI's origin. When empty (default), the daemon uses its own bind address — the correct default when the daemon serves the UI bundle itself.
New KAGED_UI_URL env var added to the env var table. Overrides ui.url per standard precedence (env > config > default).
Launch URL uses the UI base URL. The launch URL printed at startup is {ui_base_url}/launch?token=<token>, where ui_base_url is resolved from KAGED_UI_URL > ui.url > http://{bind}. This points to the UI's /launch route, which handles the token exchange via JSON content negotiation with the daemon's API. The operator's browser must reach the UI origin, not the daemon directly.
Launch token regeneration after invalidation. When the one-time launch token is consumed (operator visits the launch URL), the daemon generates a new token and logs a new launch URL to the operational log. This ensures the operator can always re-authenticate from a new browser without restarting the daemon. The previous session cookie remains valid; the new token is for new browser sessions only.

2026-05-22 — Config auto-creation, home-relative paths, launch URL fix

Config auto-creation on first run. When no config file exists at the mode-appropriate default path and no --config/KAGED_CONFIG override is set, the daemon creates a config file with mode-appropriate defaults (home-relative paths for storage, audit log, plugins dir, and the correct bind address). Logs Config created: <path>. No more silent defaults — every daemon run has an explicit, editable config file.
Home-relative path defaults. storage.url, logging.audit, and plugins.dir now default relative to ${KAGED_HOME} instead of hardcoding /var/lib/kaged. When these fields are empty (or absent) in the config file, they resolve to sqlite://${home}/kaged.db, file:${home}/audit.log, and ${home}/plugins respectively. Explicitly set values are never overwritten.
Bootstrap phase restructured. Mode is now resolved before config file discovery (from flags/env only). This eliminates the chicken-and-egg problem where the config file path depends on mode but mode might depend on config. The config file is loaded (or created) after mode and home are known.
Launch URL path fixed. Launch URLs now point to /launch?token=<token> (the UI route) instead of /api/v1/launch?token=<token> (the daemon API endpoint). The UI's /launch route handles token exchange via JSON content negotiation with the daemon's API — the operator's browser should never hit the daemon's API directly.

2026-05-22 — Runtime state files, `kaged auth nonce`, `kaged auth open`

Nonce file written at startup. Per ADR-0007 amendment, the daemon now writes the per-session nonce to a file at startup. In user mode: $XDG_RUNTIME_DIR/kaged/auth-cookie (mode 0600). In system mode: ${KAGED_HOME}/auth-nonce (mode 0600). The nonce is generated once per daemon lifetime and does not change when launch tokens are regenerated. The file path is logged to stderr: Nonce written: <path>.
Launch URL file written at startup and on regeneration. The daemon writes the current launch URL to a file alongside the nonce: $XDG_RUNTIME_DIR/kaged/launch-url (user mode) or ${KAGED_HOME}/launch-url (system mode), mode 0600. The file is rewritten whenever a launch token is consumed and regenerated. CLI commands read this file directly — no daemon API call required.
kaged auth nonce implemented. Reads the nonce directly from the nonce file (no API call). Prints to stdout. Exits non-zero in insecure mode. Per the existing CLI surface spec.
kaged auth open added. New CLI command that reads the launch URL from the runtime state file and opens it in the operator's default browser via xdg-open (Linux) or open (macOS). No API call. Exits non-zero if no launch-url file exists or in insecure mode.
CLI subcommand routing expanded. The daemon binary now dispatches kaged auth <subcommand> in addition to kaged start. The auth subcommands (nonce, open) are pure file reads + local actions — they do not require a running daemon's HTTP API.
Filesystem layout updated. launch-url added to both user-mode ($XDG_RUNTIME_DIR/kaged/) and system-mode (${KAGED_HOME}/) layouts. Both files are mode 0600.
Runtime directory ownership check. The daemon creates $XDG_RUNTIME_DIR/kaged/ (mode 0700) if it does not exist. If the directory exists but is not owned by the daemon user, the daemon refuses to start.

References

ADR-0001 — kaged as lifecycle root
ADR-0002 — web UI is bundled and served by the daemon
ADR-0004 — Bun runtime, bun build --compile
ADR-0005 — SQLite default, Postgres opt-in, portable migrations
ADR-0007 and its amendment — sidecar contract, --insecure
ADR-0008 — plugin host subprocess model
ADR-0009 and its amendment — sandbox enforcement, --no-sandbox
ADR-0028 — Antigravity provider OAuth lifecycle
ADR-0047 — session notifications; daemon-side router, Web Push bridge
http-api.md — the surface this daemon exposes
project-dsl.md — the contract this daemon parses
session-manager.md — internal session state machine
sandbox.md — cage compiler and network gatekeeper
plugin-host.md — plugin JSON-RPC protocol
notifications.md — session notification pipeline (event normalisation, presence gate, channels, Web Push)