ADR-0002: Web-first; terminal is a capability, not the product
- Status: Accepted
- Date: 2026-05-21
- Deciders: @karasu
- Supersedes: —
- Superseded by: —
Context
kaged needs a primary surface. The category we're entering — agentic dev tools and self-hosted operator consoles — is dominated by CLI-first products: opencode, oh-my-pi, and most agentic harnesses ship a terminal as the principal interface, sometimes with a web UI bolted on later as an afterthought.
There are real reasons that pattern exists. Terminals are scriptable, composable, and SSH'able. Developers live in them. Most agentic frameworks emerge from researchers who use the terminal.
But it doesn't fit kaged:
- Remote access from anywhere is core. Sessions need to be reachable from a phone in a car park, not just from an ssh-capable laptop. This is one of opencode's better insights and one of the things adjacent dashboard products market but don't deliver.
- Cloudflare Tunnel + OAuth-proxy is the access pattern. That stack expects HTTP at the edge. A CLI-first product means SSH-over-tunnel or a custom protocol; both are worse experiences than a URL.
- Multi-device session continuity. An operator starts a task on their phone, switches to a laptop, finishes from a tablet. CLI-first can technically support this via tmux + ssh; the UX is hostile for non-power-users and brittle for everyone.
- The terminal is solved in the browser. xterm.js + a PTY broker over WebSocket gives us a real terminal — same
ncursesapps, same colors, same keybindings — embedded in the surface we want to ship anyway.
The question is load-bearing because it determines the shape of every component below it: API design (REST/WS vs IPC), auth (HTTP-friendly OAuth vs SSH keys), how plugins expose UI (web components vs nothing), and how mobile-first informs the IA.
Decision
The web UI is the primary product surface. Terminal access is exposed as a capability of the web UI — a PTY broker over WebSocket, rendered by xterm.js (or equivalent) in the browser. kaged does not ship a CLI tool as a principal interface.
A thin CLI may exist for bootstrap and lifecycle operations only (kaged start, kaged status, kaged config). It is not a workflow surface; it is plumbing. The workflow is in the browser.
Consequences
What this commits us to
- A polished, opinionated web UI from day one. There is no "we'll do the UI in v2" escape hatch — the UI is v0.
- A PTY broker subsystem in the daemon: multiplexing N concurrent terminals over WebSocket, each scoped to a session/cage. See future
specs/session-manager.mdandspecs/http-api.md. - Frontend code is first-class. The web package gets equal weight in the monorepo, not a
static/afterthought. - Authentication must work over HTTP/WS (OAuth, magic link, session cookie) — not over SSH keys.
- Mobile responsiveness is a release gate, not a nice-to-have. The phone is the worst-case viewport and the highest-leverage one.
- We commit to brand and visual design as load-bearing — see
docs/brand/brand-guide.md. CLI-first products can skip this; we can't.
What this forecloses
- No "kaged is just
tmuxwith extra steps" framing. Power users who only want a terminal will find the web wrapper unnecessary. - No piping kaged into other shell tools (
kaged status | grep ...). The API surface is HTTP/WS; if shell composability is needed, write a CLI that calls the API. - No "headless-only" deployments where the web UI is disabled. The web UI is the product; running kaged without it is running kaged with the principal interface turned off.
- We cannot rely on the operator already having an
sshworkflow. Onboarding flows assume "open this URL in a browser."
What becomes easier
- Mobile access is built-in by construction, not retrofitted.
- The brand language we've developed (
docs/brand/) is applicable everywhere because everywhere is the same surface. - Plugin UIs can register web components/iframes; they don't need to invent a terminal-rendering convention.
- Auth integrates with the rest of the operator's existing stack (Cloudflare Tunnel + OAuth sidecar) trivially.
- Demos and screenshots are dramatic and shareable.
What becomes harder
- The web UI is now a sustained engineering investment, not a one-off. UI bugs are product bugs.
- Accessibility is a real concern (keyboard nav, screen reader semantics for terminal output, focus management across PTYs).
- We carry frontend dependencies (React, Vite, xterm.js, etc.) and their security/upgrade cadence.
- Browsers and mobile WebViews introduce variance the terminal doesn't: viewport quirks, iOS WebKit pinch behaviors, Android keyboard interactions. Each is a real bug class.
- WebSocket reconnection semantics for the PTY broker are non-trivial — operators dropping Wi-Fi mid-task is the common case, not an edge case.
Alternatives considered
Alternative A — CLI-first with web bolt-on (the opencode/oh-my-pi pattern)
What it is: The terminal is the principal interface. A web UI exists for dashboards or visualization, but most operator action happens in a shell.
Why tempting: It's the dominant pattern in the category. Devs know terminals. Lower frontend investment. Scriptable and composable from day one.
Why rejected: Forecloses mobile-first by construction. SSH-over-tunnel is a worse access story than HTTPS-over-tunnel for non-laptop devices. The bolt-on web UI is always an afterthought in this pattern, which is exactly the trap we're escaping from when adjacent dashboard products market a futuristic operator surface and ship a tile of disk gauges. If we ship a CLI-first product with a half-built dashboard, we are what we're competing against.
Alternative B — Equal CLI + web treatment
What it is: Both surfaces are first-class, with feature parity enforced.
Why tempting: Maximum flexibility. Power users get a terminal; mobile users get a UI; both surfaces feel complete.
Why rejected: Feature parity is an engineering tax we don't have headcount for. Every new feature ships twice. Inevitably one surface lags, and operators get inconsistent behavior between them. The principle "the web is the product" is clearer to design against than "both are the product." We can always add a CLI later that calls the API; we can't easily remove CLI-first assumptions once they're embedded.
Alternative C — Pure CLI, no web at all
What it is: Terminal only. Operators access via SSH or directly on the device.
Why tempting: Simplest implementation. No frontend investment whatsoever.
Why rejected: Kills the mobile-first promise. Kills the "your phone is a first-class device" principle. The product becomes indistinguishable from a dozen other agent-harness CLIs. Brand and UX work has nowhere to live. Fails the primary user need.
Alternative D — Native desktop/mobile apps
What it is: Electron, Tauri, React Native, or native apps as the principal surface.
Why tempting: Better keyboard handling, OS-native integration, push notifications, offline mode.
Why rejected: Each app adds a release surface (App Store, Play Store, signed installers) and a platform-specific code path. Mobile app store policies introduce ongoing friction. The PWA path through the browser gets us 80% of the benefits without the distribution overhead. We can always wrap the web app in Tauri later; we cannot easily port a Tauri-first app back to the browser.
References
docs/02-architecture.md— the in-front diagram, showing the web UI as principal surfacedocs/01-vision.md— "Three properties that matter," specifically the mobile-first commitment- ADR-0001 — the position decision this builds on
- xterm.js: https://xtermjs.org/
- Cloudflare Tunnel + OAuth sidecar: an existing deployment pattern across the operator's infra