Spec: Agent Harness

Purpose

This spec defines the agent harness — the layer that compiles the operator's project DSL into running Mastra agents, wires up the Processor pipeline for prompt observability and policy enforcement, manages provider routing, and integrates with Langfuse for optional tracing.

This document is normative for:

  • The DSL → Mastra compilation boundary — how project config becomes Agent and Workflow instances.
  • The Processor pipeline — the ordered chain that observes, audits, and enforces policy on every LLM call.
  • The supervisor/sub-agent topology — how the primary agent dispatches to caged subagents.
  • The checkpoint bridge — how Mastra's suspend/resume maps to kaged's checkpoint protocol.
  • The provider configuration — model aliases, fallback arrays, per-call routing.
  • The observability integration — Langfuse tracing via Mastra's native exporter, structured-log fallback.
  • The prompt management lifecycle — file-watched prompts, hot-reload at message boundaries.
  • The cancellation path — abortSignal wiring from operator cancel to in-flight LLM calls.
  • The plugin hook firing points — where in runPrimary and the message-reconstruction path each lifecycle hook fires (ADR-0023).
  • Context compaction — when, where, and how kaged-controlled context-window management runs at the harness boundary (ADR-0024).

It is not normative for:

This spec is about how the agent thinks — the substrate beneath the session manager's run model and above the raw LLM provider calls.

Constraints (from ADRs)

Constraint Source
Mastra v1.x is the agentic substrate; version-pinned dependency on @mastra/core ADR-0012
kaged owns all prompts — every system prompt readable and editable by the operator ADR-0012 (manifesto)
Mastra Cloud, Studio, Workspace, and RAG primitives are excluded ADR-0012
All provider calls route through @kaged/llm via a LanguageModelV2 shim; no @ai-sdk/<provider> deps ADR-0014
Langfuse is optional; kaged runs without it; structured-log fallback ADR-0013
Prompt management is file-based, not Langfuse-hosted ADR-0013
Project DSL is the portable artifact; the substrate it compiles to is implementation detail ADR-0011
Runtime is Bun + TypeScript ADR-0004
Subagents run in cages; the harness delegates cage spawning to the sandbox subsystem ADR-0009
Project plugins subscribe to lifecycle hooks (on_session_start, on_session_idle, pre_compact, post_compact); the harness is the firing point ADR-0023
on_session_start and on_session_idle fire only on the primary agent (sessions are primary-owned); pre_compact and post_compact are per-agent ADR-0023
Context compaction is kaged-owned at the reconstructMessages() boundary; Mastra's internal trimming is neutralized ADR-0024
Compaction is pre-call (proactive) with reactive fallback on provider context-length errors ADR-0024
Compaction is per-agent; subagents inherit defaults from parent ADR-0024
Compactor plugin failures fall back to the drop strategy; compaction never stalls ADR-0024
Compaction operates between LLM calls, never during a streaming response ADR-0024
Tool-call / tool-result message pairs are atomic across compaction ADR-0024

Architecture

                    ┌──────────────────────────────────────┐
                    │          Session Manager             │
                    │  (dispatches runs, owns lifecycle)   │
                    └──────────────┬───────────────────────┘
                                   │ startRun(session, message)
                                   ▼
                    ┌──────────────────────────────────────┐
                    │          Agent Harness                │
                    │                                      │
                    │  ┌────────────────────────────────┐  │
                    │  │  DSL Compiler                  │  │
                    │  │  project.yaml → Mastra config  │  │
                    │  └────────────────────────────────┘  │
                    │                                      │
                    │  ┌────────────────────────────────┐  │
                    │  │  Processor Pipeline             │  │
                    │  │  audit → policy → observe       │  │
                    │  └────────────────────────────────┘  │
                    │                                      │
                    │  ┌────────────────────────────────┐  │
                    │  │  Provider Router               │  │
                    │  │  alias → model → fallback      │  │
                    │  └────────────────────────────────┘  │
                    │                                      │
                    │  ┌────────────────────────────────┐  │
                    │  │  Checkpoint Bridge              │  │
                    │  │  Mastra suspend ↔ kaged pause   │  │
                    │  └────────────────────────────────┘  │
                    │                                      │
                    └───┬──────────┬──────────┬────────────┘
                        │          │          │
               ┌────────▼──┐  ┌───▼────┐  ┌──▼──────────┐
               │ @mastra/  │  │Langfuse│  │  Provider   │
               │   core    │  │Exporter│  │   SDKs      │
               └───────────┘  └────────┘  └─────────────┘

Five subsystems in packages/harness/:

  • DslCompiler — reads the project DSL and produces Mastra Agent configurations (primary + subagents), Workflow definitions, and tool registrations. The DSL is the input; Mastra config objects are the output.
  • ProcessorPipeline — an ordered chain of Mastra Processor instances that intercept every LLM call. Responsible for prompt auditing, policy enforcement, and pre-generation observation.
  • ProviderRouter — resolves model aliases from the operator's local config, constructs fallback arrays, and routes each generate/stream call to the correct provider.
  • CheckpointBridge — translates between Mastra's suspend/resume primitive and kaged's checkpoint protocol (defined in session-manager.md).
  • ObservabilityExporter — conditionally registers Mastra's @mastra/langfuse exporter when Langfuse is configured; falls back to structured JSON logs to stdout.

Storage strategy (v0)

Mastra Agents can be constructed without memory and without a storage adapter. v0 takes that path: every primary Agent is constructed stateless, and kaged's existing storage layer (@kaged/storage, bun:sqlite) is the source of truth for all session state.

What kaged owns (not Mastra)

  • Message history. MessageRecord rows in bun:sqlite. The harness reads them on every run and reconstructs Mastra's messages argument from messages WHERE session_id = ? AND NOT superseded ORDER BY created_at.
  • Run state. RunRecord rows, transitioned by the session-manager state machine.
  • Checkpoints. CheckpointRecord rows. The checkpoint bridge (see § Checkpoint bridge) serializes the MessageList Mastra was operating on at suspend-time, but the snapshot lives in kaged's table — not Mastra's.
  • Operator identity. created_by on session records; X-Kaged-User-Id on requests. Mastra never sees operator identity.

What Mastra owns

  • The agent loop semantics — how multi-turn tool calls, supervisor delegations, and suspend / resume flow within a single agent.stream(...) invocation.
  • The Processor pipeline lifecycle (audit, policy, observability run via Mastra's Processor hook order).
  • The provider call mechanics — but only because we plug @kaged/llm's LanguageModelV2 into Agent.model (see § Provider abstraction).

Why stateless

  • Bun-pure storage. Mastra's default storage adapters (@mastra/libsql, @mastra/pg, etc.) target Node-shaped runtimes. kaged's storage is bun:sqlite. Constructing Agents stateless removes any need to either ship a bun:sqlite-backed MemoryStorage adapter or pull a parallel SQLite driver into the daemon for Mastra's use.
  • Single source of truth. Two storage layers (kaged's and Mastra's) would diverge under the multi-operator, multi-session shapes the daemon already handles. One is simpler and matches how sessions are already modeled.
  • Survives daemon restart. Mastra holds no state across calls; the next agent.stream(...) reconstructs context from MessageRecord rows. Restarting the daemon mid-session loses only the in-flight LLM connection (already handled by run-cancellation), not history.

When this could change

A future ADR or amendment may add Mastra-side memory if v0.x or v1 introduces features that require it (e.g., observational memory across runs that aren't fully expressible through kaged's MessageRecord). Per ADR-0014, any Mastra-side persistence would be bun:sqlite-backed via a custom MemoryStorage adapter, not via @mastra/libsql or any better-sqlite3-based dependency.


DSL compilation

Compilation boundary

The project DSL (project-dsl.md) defines agents, their prompts, their tools, and their cages. The harness compiles this into Mastra runtime objects. The operator never sees Mastra types; the DSL is the interface.

project.yaml                          Mastra runtime
─────────────                          ──────────────

primary:
  model: "fast"                    →   Agent({ model: resolved("fast"), ... })
  system_prompt: ./prompts/pri.md  →   instructions: fileContent("./prompts/pri.md")
  cage: disabled                   →   cagePolicy: null (root agent, interim)
  tools:                           →   tools: { ...toolRegistry.resolve(["file.*", ...]) }
    "file.*": { enabled: true }
    "code.lsp": { enabled: true }
  # kaged.issue.* and kaged.workflow.* enabled by role-based default (root agent)
  subagents:
    researcher:
      model: "smart"               →   Agent({ model: resolved("smart"), ... })
      system_prompt: ./prompts/res.md
      cage:                        →   cagePolicy: { fs: [...], net: [...], ... }
        fs: [{ path: ./src, mode: ro }]
        net: { allow: ["api.github.com:443"] }
      tools:                       →   tools: { ...toolRegistry.resolve(["search.*"]) }
        "search.*": { enabled: true }

What the compiler produces

For each project load (or DSL hot-reload), the compiler emits:

interface CompiledAgentNode {
  config: MastraAgentConfig;           // Mastra Agent config for this node
  cagePolicy: CagePolicy | null;      // null when cage: disabled (root agent interim)
  tools: ResolvedToolSet;             // per-agent resolved tools
  children: Record<string, CompiledAgentNode>;  // recursive subtree
}

interface CompiledProject {
  root: CompiledAgentNode;             // the primary agent (tree root)
  workflows: MastraWorkflowConfig[];   // if DSL declares workflows
  promptFiles: PromptFileRef[];        // paths to watch for hot-reload
  modelAliases: Record<string, ResolvedModel>;   // from local config
}

The CompiledAgentNode is recursive — it mirrors the AgentSpec tree from project-dsl.md. Each node carries its own Mastra config, cage policy, resolved tools, and children. The compiler walks the AgentSpec tree depth-first; project references are flattened into CompiledAgentNode subtrees at compile time (per federated-config.md).

What stays outside Mastra

The compiler deliberately excludes from Mastra config:

  • Cage policies. Mastra never sees CagePolicy. The session manager passes cage policies to the sandbox subsystem when spawning subagent processes. Mastra's agents config receives the sub-agent as a participant; the sandbox enforces isolation independently.
  • kaged-internal state. Session IDs, run IDs, operator identity, audit log handles — these flow through ToolCallContext (per agent-tooling.md), not through Mastra.
  • UI state. The harness emits events to the session manager; the session manager streams to the UI. Mastra has no UI awareness.

Compilation triggers

Trigger Action
Daemon startup + project load Full compilation. Agents constructed.
DSL file change (file watcher) Re-compile. New agents available for next session. Active sessions continue with their compiled snapshot (per session-manager.md).
Prompt file change (file watcher) Prompt content reloaded. Active sessions pick up changes at next message boundary. No re-compilation of agent topology.
Local config change (model aliases) Provider router updated. Active sessions pick up changes at next LLM call.

Processor pipeline

The Processor pipeline is the core mechanism that satisfies the manifesto principle: every system prompt is readable by the operator. Per ADR-0012's audit, no code path adds messages after the Processor pipeline runs and before the provider call. The message array exiting the pipeline is what the LLM sees.

Pipeline order

Processors execute in registration order. kaged registers three:

1. AuditProcessor       — logs the full pre-generation message array
2. PolicyProcessor      — enforces kaged policies (token budgets, content gates)
3. ObservabilityProcessor — exports trace data to Langfuse (or structured logs)

AuditProcessor

Fires on every processInput and processInputStep invocation. Records the complete message array (system messages + user/assistant messages) to the kaged audit log.

class AuditProcessor implements Processor {
  processInput({ messages, systemMessages, messageList }) {
    auditLog.write({
      event: "agent.pre_generation",
      system_messages: systemMessages,
      messages: messages,
      timestamp: Date.now(),
    });
    // Pass through unmodified
    return { messages, systemMessages };
  }

  processInputStep({ messages, systemMessages, stepNumber, tools }) {
    auditLog.write({
      event: "agent.pre_generation_step",
      step: stepNumber,
      tool_count: Object.keys(tools).length,
      timestamp: Date.now(),
    });
    return { messages, systemMessages };
  }
}

This processor is always registered, even when Langfuse is disabled. The kaged audit log is the non-optional record.

PolicyProcessor

Enforces operator-configured policies before the LLM call proceeds. Can modify messages or abort execution.

Responsibilities:

  • Token budget enforcement. If the message array exceeds the configured context window budget, truncate or abort with a clear error. The operator configures budget thresholds in the DSL or local config.
  • Content gates. If the operator configures content restrictions (e.g., "subagent X must not see file Y's content"), the PolicyProcessor strips or redacts matching content from the message array before it reaches the LLM.
  • Abort on violation. If a policy is violated and cannot be remediated by message modification, the processor calls abort() with a descriptive error. The session manager surfaces this as a run failure.
class PolicyProcessor implements Processor {
  processInput({ messages, systemMessages, abort }) {
    const totalTokens = estimateTokens(messages, systemMessages);
    if (totalTokens > this.budget.hard_limit) {
      abort("Token budget exceeded: " + totalTokens + " > " + this.budget.hard_limit);
      return { messages, systemMessages };
    }
    // Apply content gates...
    return { messages, systemMessages };
  }
}

ObservabilityProcessor

When Langfuse is configured, exports pre-generation message snapshots as Langfuse trace spans. When Langfuse is not configured, writes structured JSON to stdout (the fallback path from ADR-0013).

class ObservabilityProcessor implements Processor {
  processInput({ messages, systemMessages }) {
    if (this.langfuse) {
      this.langfuse.span({
        name: "pre_generation",
        input: { system_messages: systemMessages, messages },
      });
    } else {
      structuredLog("agent.trace", { system_messages: systemMessages, messages });
    }
    return { messages, systemMessages };
  }
}

Processor visibility guarantees

Per the ADR-0012 audit:

Hook When What kaged sees
processInput() Once at start of execution Full messages + systemMessages (agent instructions, memory context, user-provided)
processInputStep() Every agentic loop step (including tool-call continuations) Same as above, plus stepNumber; can override tools and toolChoice
processOutputStream() During streaming output Output chunks from the LLM

Nothing is injected after the pipeline. The message array the pipeline emits is the message array the LLM receives.


Supervisor pattern (recursive)

Per ADR-0022, the agent tree is recursive. Every agent that has subagents is a Mastra supervisor over its direct children. The tree structure is the call graph: a parent calls its direct children; sibling and cross-tree calls do not exist. There is no can_be_called_by check and no event-routed dispatch (interconnect).

How it works

  1. The DSL compiler walks the AgentSpec tree depth-first, producing a CompiledAgentNode at each level. For every node that has children, the compiler registers the children on the parent's agents config:
// Recursive — called for each node in the AgentSpec tree
function buildAgentNode(
  spec: AgentSpec,
  key: string,
  treePath: string,        // e.g. "primary", "primary.subagents.researcher"
): CompiledAgentNode {
  const children: Record<string, CompiledAgentNode> = {};
  const childAgents: Record<string, Agent> = {};

  for (const [childKey, childSpec] of Object.entries(spec.subagents ?? {})) {
    const childPath = `${treePath}.subagents.${childKey}`;
    const childNode = buildAgentNode(childSpec, childKey, childPath);
    children[childKey] = childNode;
    childAgents[childKey] = childNode.config.agent;
  }

  const agent = new Agent({
    name: key,
    instructions: () => promptStore.get(treePath),
    model: kagedModel(resolveRoute(spec.model)),
    tools: resolveToolsForAgent(spec, treePath),
    agents: childAgents,      // direct children only
  });

  return {
    config: { agent },
    cagePolicy: spec.cage === "disabled" ? null : compileCage(spec.cage),
    tools: resolveToolsForAgent(spec, treePath),
    children,
  };
}
  1. Mastra converts each child agent into a synthetic tool: agent-researcher, agent-writer. The tool's description is the child's description field — operator-provided via the DSL.

  2. The parent's instructions contain the routing logic. This is operator-authored in the DSL's system prompt file. No framework-generated routing prompt is injected.

  3. The LLM in the parent decides when to call which agent-{key} tool based on the instructions and descriptions. Delegation is reasoning-driven, not declarative.

  4. Depth limit. The tree is bounded at 16 levels (same as the project-reference depth limit in federated-config.md). The compiler rejects deeper nesting at parse time.

Per-agent tool resolution

Each agent in the tree has its own resolved tool set. The resolution chain (from agent-tooling.md):

  1. Built-in registry (all tools exist but are not enabled by default)
  2. Role-based defaults (root agent gets kaged.issue.* and kaged.workflow.*; all others start empty)
  3. Agent's tools: block (operator opts in per agent)
  4. principal_scope enforcement (schema rejects kaged.* on non-root agents)
  5. Cage filter at dispatch time

The resolveToolsForAgent() function applies steps 1–4 at compile time. Step 5 is runtime.

Delegation hooks

kaged registers three hooks on every supervisor (any agent with children):

Hook Purpose
onDelegationStart Logs the delegation to the audit log with the full tree-position path. Can modify the prompt sent to the child or abort the delegation (e.g., if the child's cage is not ready).
onDelegationComplete Logs the result. Can provide feedback to the parent or bail on the delegation chain.
messageFilter Controls which parent messages reach each child. kaged uses this to strip messages containing content outside the child's cage allowlist.

Message hygiene

Mastra's stripParentToolParts() removes parent-tool-call references from messages forwarded to children (preventing children from seeing tools they don't have). kaged's messageFilter adds cage-aware content filtering on top.

Cage integration

When a parent agent delegates to a child, the session manager:

  1. Receives the delegation via onDelegationStart.
  2. Looks up the child's CagePolicy from the CompiledAgentNode.
  3. If cagePolicy is null (child has cage: disabled), the child runs in the daemon's process context — same as the root agent.
  4. If cagePolicy is non-null, spawns the child's process in a cage (per sandbox.md).
  5. The Mastra agent runs inside the caged process; its tools are mediated by ToolPermissions (per agent-tooling.md).

The root agent's cage must be disabled in the current interim state (ADR-0022 § Interim state). The supervisor infrastructure to cage the primary process is scheduled for a follow-up ADR.


Checkpoint bridge

kaged's checkpoint protocol (defined in session-manager.md) enables operator pause, resume, and rollback of agent execution at message boundaries.

v0 model: kaged-native stateless checkpoints

v0 does not use Mastra's Workflow suspend()/resume() primitive. Mastra Agents are constructed stateless (§ Storage strategy), and Mastra Workflow suspend requires Mastra-side snapshot persistence (StorageAdapter) that kaged does not provide. Instead, checkpoints are implemented natively using kaged's existing storage layer.

A checkpoint is a pointer into the persisted message history. No separate snapshot blob is stored. The full state at any checkpoint is reconstructible from MessageRecord rows up to the messageCursor plus the prompt content at that time.

Checkpoint triggers

Trigger Mechanism
Operator pauses (⏸) Daemon fires abortController.abort(). In-flight agent.stream() cancels. Harness captures partial output and returns with finishReason: "aborted". Daemon persists the partial assistant message, creates CheckpointRecord, transitions session running → paused.
Model calls checkpoint tool Tool handler sets a checkpointRequested signal on shared run context and returns a result indicating execution will pause ("Checkpoint taken. Awaiting operator."). The current generation completes naturally. runPrimary returns with checkpointRequested populated on the result. Daemon creates CheckpointRecord, transitions session running → paused.

Checkpoint record

Created by the daemon's capture_checkpoint effect handler:

const record: CheckpointRecord = {
  id: generateId(),
  sessionId,
  runId,
  createdAt: Date.now(),
  createdBy: initiator,              // "operator" | "model"
  reason: detail ?? null,
  messageCursor: lastMessageId,      // last MessageRecord.id before the pause
  resumedAt: null,
  rolledBack: false,
  supersededBy: null,
};
storage.createCheckpoint(record);

The messageCursor points to the last MessageRecord.id persisted before the checkpoint. For operator pause, this is the partial assistant message from the aborted stream. For model-requested checkpoints, this is the complete assistant message from the naturally-finished stream.

runPrimary result extension

RunPrimaryResult carries an optional checkpointRequested field so the daemon can distinguish a normal completion from one that should trigger a checkpoint:

interface RunPrimaryResult {
  // ... existing fields ...
  checkpointRequested?: {
    detail?: string;                 // model-provided reason
  };
}

When checkpointRequested is present, the daemon creates a CheckpointRecord with createdBy: "model" and transitions the session to paused instead of idle. When absent, the daemon follows the normal run_completed → idle path.

For operator-initiated pause, the daemon does not wait for the runPrimary result — it fires abortController.abort() immediately and creates the checkpoint in its own handler, independent of the harness return path.

checkpoint tool

The checkpoint tool is a kaged-internal tool registered on the primary agent (defined in agent-tooling.md). Its behavior:

  1. The model calls checkpoint({ reason?: string }).
  2. The tool handler sets a shared checkpointRequested flag accessible to the runPrimary closure.
  3. The tool returns { status: "checkpoint_taken", message: "Execution paused. Awaiting operator." }.
  4. The model sees the result. Since the tool's description instructs it to stop after calling checkpoint, the model produces a final text response and the generation ends naturally.
  5. runPrimary returns with checkpointRequested populated.

The checkpoint tool's description in the tool registry is: "Pause execution and yield control to the operator. Call this when you need human review, approval, or input before proceeding. After calling this tool, produce a brief summary of your current state and stop."

Resume

  1. Operator sends a resume request (POST /api/v1/sessions/:id/resume).
  2. Daemon loads the CheckpointRecord via storage.getCheckpoint(checkpointId).
  3. Checks for prompt edits via storage.listPromptEdits(checkpointId). If edits exist, the apply_prompt_edits effect updates the prompt store.
  4. Reconstructs the message history from MessageRecord rows — same reconstructMessages() path used by dispatchPrimary, which already filters superseded messages.
  5. Creates a new run.
  6. Calls runPrimary with the reconstructed messages. The model continues naturally from the full context.
  7. Updates the checkpoint: storage.updateCheckpoint(checkpointId, { resumedAt: Date.now() }).
  8. Session transitions paused → running (via session machine resume event).

Resume is a new run, not a continuation of the old one. The model sees the full message history and generates a new response. This is the intended behavior: at a message boundary, the next generation is always a fresh agent.stream() call with reconstructed context.

Rollback

  1. Operator sends a rollback request (POST /api/v1/sessions/:id/rollback).
  2. Session machine emits kill_post_checkpoint_subagents and supersede_messages_after side effects.
  3. The supersede_messages_after effect marks all MessageRecord rows created after the rollback target checkpoint's messageCursor as superseded = true.
  4. Checkpoint's rolledBack flag is set to true.
  5. Session transitions paused → idle.
  6. The next post_message reconstructs messages from non-superseded rows only, effectively rewinding history to the checkpoint.

The rollback target checkpoint is preserved — it is not deleted.

Prompt editing during pause

While paused, the operator can edit agent prompts via the checkpoint inspection API. Edits are stored in the prompt_edits table:

interface PromptEditRecord {
  id: string;
  checkpointId: string;
  sessionId: string;
  target: string;              // agent name ("primary", or subagent name)
  oldHash: string;
  newHash: string;
  newContent: string;
  editedAt: number;
}

On resume, the apply_prompt_edits effect reads the edits for the checkpoint and updates the prompt store (on-disk files or the in-memory prompt cache). The resumed run uses the new prompt content.

Limitations

  • No mid-generation pause. abortController.abort() cancels the in-flight HTTP connection. Partial output up to the abort point is captured and persisted, but the model's response is incomplete. The operator sees the partial output and can resume (which starts a fresh generation) or rollback.
  • No Mastra-side state preservation. Since Agents are stateless, there is no Mastra MessageList to serialize. Context is reconstructed from kaged's MessageRecord rows on every run. This is by design (§ Storage strategy, § Survives daemon restart).
  • Resume cannot continue mid-tool-loop. If the model was in the middle of a multi-step tool-use loop when paused, resume starts a fresh generation. The model sees the tool calls and results from the partial run in its message history and decides how to proceed. This matches the spec's message-boundary semantics.

When this could change

A future amendment may adopt Mastra Workflow suspend()/resume() if kaged adds a bun:sqlite-backed StorageAdapter for Mastra. This would enable mid-step suspension (between workflow steps) without aborting the HTTP connection. The checkpoint bridge helpers in checkpoint-bridge.ts (suspend(), resume(), serializeSnapshot(), deserializeSnapshot(), snapshotFromMessages()) are retained as pure serialization utilities for that future path.


Provider configuration

Model aliases

Operators configure model aliases in local config (local-config.md). The DSL references aliases; the harness resolves them at runtime.

# local.toml
[models]
fast = "anthropic:claude-sonnet-4-20250514"
smart = "anthropic:claude-opus-4-20250514"
cheap = "openai:gpt-4.1-mini"
local = "ollama:llama-4-scout"
# project.yaml
agents:
  primary:
    model: "smart"        # resolved to anthropic:claude-opus-4-20250514
  subagents:
    researcher:
      model: "fast"       # resolved to anthropic:claude-sonnet-4-20250514

Fallback arrays

Operators can configure fallback chains for resilience:

[models]
smart = ["anthropic:claude-opus-4-20250514", "openai:o3", "ollama:llama-4-scout"]

The provider router tries each in order. If the first provider returns an error (rate limit, outage, timeout), the router falls back to the next. Fallback is per-call, not per-session.

Provider credentials

Provider API keys are configured in local config (per local-config.md), never in the project DSL (per ADR-0011 — project files are portable; credentials are operator-local).

# local.toml
[providers.anthropic]
api_key = "${KAGED_ANTHROPIC_API_KEY}"

[providers.openai]
api_key = "${KAGED_OPENAI_API_KEY}"

Environment variable references are resolved at config load time. Raw API keys in config files are supported but discouraged (documented as a security consideration).

Provider abstraction

Per ADR-0014, all LLM calls route through @kaged/llm. The harness does not depend on @ai-sdk/<provider> packages. Instead, @kaged/llm exposes a LanguageModelV2 factory — the Vercel AI SDK provider interface that Mastra v1.x consumes — which the harness uses as the model field on every Mastra Agent.

import { kagedModel } from "@kaged/llm/mastra";

const agent = new Agent({
  id: "primary",
  name: "Primary",
  instructions: () => promptStore.get("primary"),
  model: kagedModel(resolvedRoute),       // ← LanguageModelV2 backed by @kaged/llm
  tools: { /* ... */ },
});

kagedModel(route) returns a LanguageModelV2 whose doStream / doGenerate methods map Mastra's LanguageModelV2CallOptions to kaged's Context + StreamOptions, call streamModel() / completeModel(), and translate the resulting StreamEvents back into LanguageModelV2StreamParts. The mapping lives in packages/llm/src/mastra-model.ts and is the only Mastra-aware code in @kaged/llm.

Implications:

  • One provider code path. Agent loop, provider test endpoint, ad-hoc calls — all route through streamModel / completeModel.
  • Custom headers, retry policy, OAuth refresh, telemetry — all extension points live in @kaged/llm.
  • No transitive @ai-sdk/<provider> dependencies (only @ai-sdk/provider-v5 for interface types, pinned to whichever version Mastra v1.x targets).
  • OAuth / subscription providers are supported by design: @kaged/llm is operator-owned code and may ship adapters Mastra / Vercel won't (see ADR-0014 and llm.md for the OAuth strategy). v0 ships API-key providers only.

Observability

Langfuse integration

When the operator configures Langfuse credentials in local.toml, the daemon initializes a Mastra observability instance with a LangfuseExporter. Before each primary run, agent.__registerMastra(mastra) injects the observability context into the agent. Mastra then automatically creates hierarchical traces with correct nesting:

  • invoke_agent span (type: AGENT) — top-level run context
  • chat model span (type: GENERATION) — LLM call with model, provider, usage
  • model_step spans — individual reasoning/generation steps
  • Tool execution spans — tool name, input, output, timing

For multi-step runs (tool calls followed by additional LLM generation), Mastra creates additional generation and tool spans automatically. Tool definitions, provider/model metadata, and token usage are enriched by Mastra without manual code.

When Langfuse is not configured, kaged runs normally and emits no Langfuse traces.

# local.toml
[langfuse]
enabled = true
base_url = "http://langfuse.local:3000"
public_key_env = "KAGED_LANGFUSE_PUBLIC_KEY"
secret_key_env = "KAGED_LANGFUSE_SECRET_KEY"

The observability instance is initialized from daemon-side local config:

import { initMastraObservability } from "@kaged/harness";

if (pubKey && secKey) {
  initMastraObservability({
    enabled: true,
    baseUrl: config.langfuse.base_url ?? "https://cloud.langfuse.com",
    publicKey: pubKey,
    secretKey: secKey,
  });
}

Per ADR-0013, kaged uses Mastra's native observability pipeline rather than maintaining a separate manual Langfuse SDK integration. This ensures correct trace hierarchy, automatic metadata enrichment, and compatibility across Mastra version upgrades without manual tracing code that drifts from the actual execution flow.

Structured-log fallback

When Langfuse is not configured (the default), the ObservabilityProcessor writes structured JSON logs to stdout. These are sufficient for basic debugging and integrate with any log aggregator (Loki, Datadog, etc.).

Log format:

{
  "level": "info",
  "event": "agent.generation",
  "session_id": "01HXAB...",
  "run_id": "01HXAC...",
  "agent": "primary",
  "model": "anthropic:claude-sonnet-4-20250514",
  "tokens_in": 1523,
  "tokens_out": 847,
  "duration_ms": 2340,
  "tool_calls": ["file.read", "code.lsp"],
  "timestamp": "2026-05-22T14:30:00.000Z"
}

What is NOT in Langfuse

Per ADR-0013:

  • Prompt management. Prompts are files on disk, not Langfuse-hosted.
  • A/B testing. Not a first-party feature. Operators version prompt files and compare traces.
  • Prompt audit log. The git log of prompt files is the audit log.

Prompt management

Prompts are files

All prompts that shape agent behavior live in the project directory as files referenced from the DSL:

# project.yaml
agents:
  primary:
    system_prompt: ./prompts/primary.md
  subagents:
    researcher:
      system_prompt: ./prompts/researcher.md

The harness reads prompt files at compilation time and registers them for file watching.

Hot-reload

The daemon's file watcher (per daemon.md) monitors prompt files referenced by the active project. On change:

  1. The harness re-reads the file content.
  2. At the next message boundary (never mid-generation), the agent's instructions are updated with the new content.
  3. The session manager emits a prompt.reloaded audit event.
  4. The UI shows an indicator: "System prompt updated."

This is the instructions field on Mastra's Agent — which supports dynamic instructions via a function. The harness provides a function that reads from the current file content:

const agent = new Agent({
  instructions: () => promptStore.get("primary"),
  // ...
});

Prompt visibility

The operator can view all active prompts via the UI or API:

  • GET /api/v1/sessions/:id/prompts — returns the current system prompts for the primary and all active subagents.
  • At a checkpoint, prompts are editable (per session-manager.md).

The AuditProcessor logs the full system message array on every generation. The operator can always see exactly what the LLM received.


Cancellation

abortSignal wiring

When the operator cancels a run (per session-manager.md):

  1. The session manager calls abortController.abort().
  2. The abortSignal propagates to Mastra's in-flight generate/stream call.
  3. Mastra aborts the HTTP connection to the provider.
  4. If sub-agents are active, the session manager also sends SIGTERM to their caged processes.
  5. The harness captures partial output from the aborted generation.

The abortSignal is passed to Mastra's Agent.generate() / Agent.stream():

const result = await primaryAgent.generate(messages, {
  abortSignal: runAbortController.signal,
});

Plugin hook firing

Per ADR-0023, the harness is the firing point for project-plugin lifecycle hooks. The plugin host (per plugin-host.md § Lifecycle hooks) defines the wire protocol; this section is normative for where in the run lifecycle each hook fires.

Hook firing summary

Hook Where in runPrimary Scope Cardinality
on_session_start Inside reconstructMessages(), after history reconstruction and before the system prompt is finalized Primary only — sessions are primary-owned Once per session (first run only; tracked by session.recalledOnce flag)
on_session_idle Outside runPrimary — fired by the session manager's idle detector after a debounce window with no run activity Primary only Once per idle event
pre_compact Inside the compaction pipeline, before strategy (see Compaction) Per-agent Once per compaction event for the affected agent
post_compact Inside the compaction pipeline, after strategy (see Compaction) Per-agent Once per compaction event for the affected agent

on_session_start firing

The harness fires on_session_start exactly once per session, on the first runPrimary invocation for that session. The session manager tracks this via a recalled_at column on the session record; the harness reads this column in reconstructMessages() and skips firing if non-null.

Firing order inside reconstructMessages():

1. Read MessageRecord rows for this session (non-superseded, ordered)
2. Build the base system prompt from the agent's `instructions` function
3. IF session.recalled_at IS NULL AND agent_path == "primary":
     For each plugin declared on the primary with `on_session_start` in hooks:
       Call plugin.kaged.hook.on_session_start({ _context })
       If result.inject is non-empty:
         Wrap in <plugin:NAME>...</plugin:NAME>
         Append to the system prompt array (after the base instructions)
   Set session.recalled_at = NOW()
4. Continue with normal message reconstruction

Multiple plugins on the primary fire in manifest-declaration order. Each plugin's inject content is appended in order. The audit log (agent.pre_generation event from the AuditProcessor) sees the assembled system prompt — including all <plugin:NAME> blocks — exactly as the LLM will receive it.

Failure handling. A hook that throws, times out, or returns a malformed result is logged (plugin.hook.failed / plugin.hook.timeout per plugin-host.md) and treated as if the plugin returned null. The session continues with no inject from that plugin.

on_session_idle firing

Idle detection is not part of runPrimary — it is a session-manager concern that fires after the last run completes and the session has been idle for the debounce window.

Firing flow:

1. Run completes; session transitions to idle state
2. Session manager starts an idle timer (debounce window)
3. If a new run starts before the timer fires: cancel the timer
4. If the timer fires:
     For each plugin declared on the primary with `on_session_idle` in hooks:
       Fetch the message transcript (all non-superseded messages since session start
       or since last idle fire — plugins decide via their config)
       Call plugin.kaged.hook.on_session_idle({ _context, transcript })
       Response is not expected; treat as fire-and-forget after JSON-RPC ack
   Mark session.last_idle_fire = NOW()

The debounce window default is configurable per-plugin in the manifest (plugins may want different idle thresholds). The session manager's default if a plugin specifies none is 30 seconds.

Restart semantics. Per ADR-0023, pending idle timers are not restored across daemon restart. After a restart, the session-manager re-arms the timer on the next genuine run completion.

pre_compact and post_compact firing

pre_compact and post_compact fire inside the compaction pipeline, see Compaction below for the full firing-point detail.

Subagent semantics

Plugins declared on subagents:

  • Receive pre_compact and post_compact when the subagent's context window is compacted (each agent has its own window, per ADR-0024).
  • Never receive on_session_start or on_session_idle — these are primary-only because subagents have no sessions. The plugin host emits a plugin.hook.illegal warning at load time when a subagent's plugin declaration includes these hooks; the daemon still starts but the hooks never fire.

Context populated

For every hook fire, the harness populates the canonical PluginCallContext (per plugin-host.md):

  • operator_id — from the request that started the run (or the session creator for on_session_idle).
  • project_id — the normalized project root of the current project load.
  • agent_path, the canonical path of the agent the hook is firing for. For session-lifecycle hooks this is always "primary"; for pre_compact and post_compact it is the path of the agent whose window is being compacted.
  • session_id — the session ID.
  • request_id — a freshly-generated request ID, distinct from any HTTP request ID. Hook firings have their own trace IDs.

Compaction

Per ADR-0024, context compaction is kaged-owned at the harness boundary. This section is the normative spec for how compaction runs inside runPrimary and reconstructMessages().

Where compaction lives

The harness is the single owner of context-window management. Mastra's internal trimming is neutralized — the harness never lets Mastra's message-list exceed the configured threshold so Mastra has nothing to trim. Specifically:

  1. The harness reconstructs the message list from MessageRecord rows on every run (the existing reconstructMessages() path).
  2. Before the reconstructed list is handed to agent.stream(...), the harness runs the compaction pipeline.
  3. The pipeline either passes the list through unchanged (no compaction needed) or replaces it with a compacted version.
  4. Mastra only ever sees the post-compaction list.

This means: Mastra never trims, because the list it receives is always within bounds. If a context-length error still occurs (estimator wrong, model metadata stale, etc.), the harness compacts reactively and retries — see § Reactive fallback.

The compaction pipeline

┌───────────────────────────────────────────────────────────┐
│ reconstructMessages() — build candidate list              │
│   (read MessageRecord rows; apply on_session_start;       │
│    drop tool calls per include_tool_results_in_context)   │
└──────────────────────┬────────────────────────────────────┘
                       │
                       ▼
┌───────────────────────────────────────────────────────────┐
│ Tool output pruning pre-pass (per § Tool output pruning)  │
│   replace stale large tool results with short notices     │
│   lightweight — no superseding, no CompactionRecord       │
└──────────────────────┬────────────────────────────────────┘
                       │
                       ▼
┌───────────────────────────────────────────────────────────┐
│ Token estimation (per @kaged/llm)                         │
│   compute total estimated tokens for the candidate list   │
│   + system prompt + model's reserved-output budget        │
│   (uses FALLBACK_CONTEXT_WINDOW when model meta absent)   │
└──────────────────────┬────────────────────────────────────┘
                       │
                  estimate < upper_threshold?
                       │            │
                  yes  │            │  no
                       │            ▼
                       │  ┌──────────────────────────────┐
                       │  │ Compaction triggered          │
                       │  │   audit: compaction.triggered │
                       │  └────────────┬─────────────────┘
                       │               │
                       │               ▼
                       │  ┌──────────────────────────────┐
                       │  │ 1. Fire pre_compact (observer) │
                       │  │    for each subscribed plugin │
                       │  │    in manifest-declaration    │
                       │  │    order                      │
                       │  │    Apply retain[] / inject    │
                       │  └────────────┬─────────────────┘
                       │               │
                       │               ▼
                       │  ┌──────────────────────────────┐
                       │  │ 2. Apply strategy:            │
                       │  │   drop / summarize / delegate │
                       │  │   / checkpoint                │
                       │  │   (delegate = compactor plug) │
                       │  └────────────┬─────────────────┘
                       │               │
                       │               ▼
                       │  ┌──────────────────────────────┐
                       │  │ 3. No-op guard:               │
                       │  │    if 0 messages superseded   │
                       │  │    → return unchanged         │
                       │  │    audit: compaction.noop     │
                       │  │    (skipped for dry-run)      │
                       │  └────────────┬─────────────────┘
                       │               │
                       │               ▼
                       │  ┌──────────────────────────────┐
                       │  │ 4. Mark superseded; persist   │
                       │  │    CompactionRecord; audit    │
                       │  │    compaction.completed       │
                       │  └────────────┬─────────────────┘
                       │               │
                       └───────────────┘
                                       │
                                       ▼
                       ┌───────────────────────────────────┐
                       │ Hand compacted list to            │
                       │ agent.stream(...)                  │
                       └───────────────────────────────────┘

Trigger semantics

Pre-call (proactive). Before each LLM call, the harness:

  1. Reconstructs the candidate message list.
  2. Runs tool output pruning (see § Tool output pruning below) to reduce stale tool-result bloat.
  3. Estimates token usage via @kaged/llm's estimator (see llm.md § Token estimation). When model metadata is unavailable, the estimator uses a fallback context window of 128,000 tokens to ensure fraction is always positive and proactive compaction can trigger.
  4. Adds the system prompt size and the model's reserved-output budget (from the model alias's metadata).
  5. Compares against the agent's configured upper threshold (default 0.85 of the model's context window).
  6. If over: fire compaction. Compact until the estimate is below the lower threshold (default 0.60; hysteresis prevents oscillation).

Reactive (post-failure). If the estimate is wrong and the provider returns a context-length error (specific error codes per @kaged/llm), the harness:

  1. Catches the error (detected via isContextLengthError — regex patterns for OpenAI, Anthropic, Google, Azure error messages + HTTP 413).
  2. Checks the reactiveRetryAttempted flag — if already set, re-throws immediately (single retry cap).
  3. Sets reactiveRetryAttempted = true.
  4. Treats the failure as a forced compaction trigger (trigger: "provider_overflow_retry").
  5. Compacts the message list.
  6. No-op guard: if compaction superseded zero messages (compacted: false), the retry is skipped — the original error propagates. Retrying with unchanged context would hit the same error.
  7. Retries the call once with the compacted message list. If the retry also fails on context length, the run is marked failed with a context_overflow error.

Context-overflow error messages from failed provider calls are not persisted to the message history — they would consume context on subsequent runs and provide no value to the model.

The reactiveRetryAttempted flag also gates a secondary detection path: when runPrimary returns finishReason: "error" with a context-length error message (rather than throwing), the same compaction-and-retry flow applies, subject to the same single-attempt cap.

The audit log captures both pre-call and reactive triggers with distinct reason fields (threshold_crossed vs provider_overflow_retry).

Strategy execution

The four strategies (configured in AgentSpec.compaction.strategy):

drop

The default / fallback. When configured as strategy: "drop", the harness first attempts summarization if both conditions are met: (1) a summarizeFn is available, and (2) the agent's compaction config includes a summarize block. If summarization succeeds, the result is used — this preserves more context than a pure drop. If summarization throws, the pipeline falls back to pure drop silently.

When summarization is not available (no summarizeFn or no summarize config), or when the summarization attempt fails, the harness drops oldest non-superseded, non-always-keep messages until the estimate is below the lower threshold. Drops happen in tool-pair-atomic units (tool call + tool result drop together, never split). The dropped messages are marked superseded = true.

No plugin compactor is called. Observer hooks still fire before the drop.

summarize

The harness invokes a summarizer model (resolved via the operator's model alias, configured in agent.compaction.summarize.model) with the operator-authored summarizer prompt (referenced via project:/ URI in agent.compaction.summarize.prompt). The summarizer receives the window of messages to compact and returns a summary message.

The returned summary becomes a single new MessageRecord (role: "system", metadata: { "kind": "compaction_summary", ... }). The compacted messages are marked superseded = true. The reconstructed list becomes: [always-keep, summary, recent messages].

Summarizer cost is tracked separately in the session's stats — see § Cost surfacing below.

delegate

The harness calls the designated compactor plugin (named in agent.compaction.delegate.plugin) via pre_compact with role: compactor. The plugin returns a CompactorResult (per plugin-host.md § Plugin roles). The harness replaces the message list with the result's messages, marks result.superseded as superseded, and creates a CompactionRecord with result.summary.

If the plugin fails (throws / times out / returns invalid CompactorResult), the harness falls back to drop and logs the full failure chain.

checkpoint

The harness creates a CheckpointRecord (per session-manager.md § Checkpoint protocol) with reason: compaction_pending, marks the session paused, and ends the run with finishReason: "awaiting_compaction". The operator inspects the proposed compaction (the configured fallback sub-strategy — drop by default) and approves, edits, or rejects via the Compactor UI (per ui/compactor.md). On resume, the operator-approved compaction is applied.

Always-keep set

The following messages are never compacted regardless of strategy:

  1. The system prompt(s) — always passed through.
  2. The first operator message in the session (the initial task).
  3. Messages whose metadata.always_keep = true. Plugins can set this metadata via agent-tooling.md message annotations.
  4. Operator-configured always-keep predicates (per agent.compaction.always_keep).

The compaction pipeline filters always-keep messages out of the compaction candidate set before any strategy runs. If a compactor plugin returns a messages list that omits an always-keep message, the harness emits compactor_dropped_always_keep and falls back to drop.

Atomicity of tool-call/tool-result pairs

Tool calls and their results are coupled. The compaction pipeline treats them as atomic units:

  • A drop strategy drops the pair together (never the call without the result, or vice versa).
  • A summarize strategy receives the full pair as one logical unit; the summarizer sees both call and result.
  • A delegate strategy: the messages_being_compacted array sent to the compactor includes the full pair; the compactor must respect the coupling (if it returns a list with a call but no result, the harness emits compactor_split_tool_pair and falls back to drop).

Tracking and persistence

A CompactionRecord is written for every compaction event:

interface CompactionRecord {
  id: string;
  sessionId: string;
  runId: string;            // the run that triggered (or the one immediately after for idle-driven)
  agentPath: string;        // which agent's window was compacted
  createdAt: number;
  trigger: "threshold_crossed" | "provider_overflow_retry" | "operator_manual" | "scheduled";
  strategy: "drop" | "summarize" | "delegate" | "checkpoint";
  thresholdEstimate: number;       // estimated token usage at trigger time
  afterEstimate: number;            // estimated token usage after compaction
  windowUpper: number;              // configured upper threshold
  windowLower: number;              // configured lower threshold
  supersededMessageIds: string[];   // messages marked superseded by this event
  summaryMessageId: string | null;  // new MessageRecord ID, if summarize/delegate
  summary: string | null;           // human-readable summary (audit log + UI)
  pluginsFired: { name: string, role: "observer" | "compactor", duration_ms: number, result_kind: "inject" | "retain" | "compactor_result" | "null" | "error" }[];
  pluginCost: { provider: string, model: string, input_tokens: number, output_tokens: number, cost_usd: number } | null;
  fallbackOccurred: boolean;        // true if a delegate/summarize fell back to drop
  fallbackReason: string | null;    // human-readable reason if fallback
  operatorFlag: "good" | "bad" | "neutral" | null;   // operator feedback (per ADR-0024)
  operatorNotes: string | null;
}

The schema migration is in @kaged/storage; see also session-manager.md § CompactionRecord.

Dry-run mode

The harness exposes a dry-run path: given a session and an agent, compute what compaction would do against the current message list without committing. The dry-run:

  1. Runs the trigger check (always considered "triggered" in dry-run).
  2. Fires observer hooks (with a dry_run: true flag in _context so plugins can skip side effects).
  3. Executes the strategy.
  4. Returns the proposed CompactionRecord (without persisting) and the proposed messages list.

The Compactor UI uses this to preview compaction before the operator commits. The dry-run path is the strategy-preview implementation; manual-compact uses the same code path but commits.

The dry-run is exposed via POST /api/v1/sessions/:id/compactions/dry-run (per http-api.md).

Manual compaction

Operators can trigger compaction at any time via the Compactor UI or directly via POST /api/v1/sessions/:id/compact. The endpoint accepts an optional strategy override and an optional agent_path (defaults to primary). The harness runs the same pipeline as automatic compaction; the trigger field on the resulting CompactionRecord is operator_manual.

Per-agent semantics

Each agent in the recursive tree has its own context window and its own compaction config. The harness tracks per-agent estimates and fires compaction independently for each agent. A subagent that has not declared the memory plugin (or any compaction-relevant plugin) still has its window managed — the drop strategy is the default fallback.

Cost surfacing

When the summarize strategy invokes a model, its cost is tracked separately in the session's stats:

  • A new field stats.compactor_cost: CostBreakdown | null on RunPrimaryResult (parallel to stats.cost for the primary call).
  • MessageRecord.metadata.compactor_cost for the summary message (if summarize or delegate produced one).
  • The session's aggregate cost view (per http-api.md session-detail response) sums both: primary: $X, compactor: $Y, total: $Z.

The Compactor UI surfaces compactor cost prominently in the per-session stats panel.

Audit events

The harness emits these audit events for compaction (in addition to the plugin host's hook-firing events):

Event When Data
compaction.triggered A compaction event begins session, run, agent, trigger, threshold_estimate, strategy
compaction.completed Compaction succeeded session, run, agent, after_estimate, superseded_count, summary_message_id, plugins_fired, duration_ms
compaction.noop Strategy executed but superseded zero messages session, run, agent, reason, strategy
compaction.failed A strategy failed and fallback ran session, run, agent, attempted_strategy, fallback_strategy, reason
compaction.flagged Operator attached a flag to a compaction event session, compaction_id, flag, notes_length

Reactive fallback retry

The reactive path enforces a single retry cap via a reactiveRetryAttempted flag scoped to the dispatch cycle. Once set, no further reactive compaction is attempted regardless of subsequent errors.

The no-op guard applies to the reactive path: if compaction superseded zero messages, the retry is skipped and the original context-length error propagates. This prevents infinite loops where compaction "succeeds" but context is unchanged.

Context-overflow error messages from failed provider calls are not persisted to the message history — they would consume context on subsequent runs.

If the single retry also fails on context length, the run is marked failed and the session transitions to a degraded state (context_overflow). The operator sees this in the UI as a clear error with the option to manually compact and retry.

Tool output pruning

Before the compaction pipeline runs, the harness applies a lightweight pruning pre-pass to reduce context pressure from stale tool-result messages. Pruning is not compaction — it does not mark messages superseded, does not produce a CompactionRecord, and does not fire plugin hooks. It operates on the CompactableMessage[] list in place.

The pruning algorithm:

  1. Walk tool-result messages from newest to oldest.
  2. Accumulate token counts. The most recent protectTokens (default 40,000) worth of tool output is left intact.
  3. Beyond the protection window, replace large tool-result content with a short [Pruned — N tokens] notice.
  4. Skip tool results whose toolName is in the protectedTools set (default: read, skill).
  5. Skip results where the pruned notice would be as large as or larger than the original content (no net savings).
  6. Only apply pruning if the total savings across all candidates exceeds minimumSavings (default 20,000 tokens). If not, return the message list unchanged.

Pruning runs as a pre-pass in dispatchPrimary (daemon side), before runCompactionPipeline(). The pruned message list feeds into the compaction pipeline as its input. If pruning reduces context below the upper threshold, compaction does not fire — pruning alone was sufficient.

interface PruneConfig {
  readonly protectTokens: number;       // default 40_000
  readonly minimumSavings: number;      // default 20_000
  readonly protectedTools: readonly string[];  // default ["read", "skill"]
}

interface PruneResult {
  readonly messages: readonly CompactableMessage[];
  readonly prunedCount: number;
  readonly tokensSaved: number;
}

function pruneToolOutputs(
  messages: readonly CompactableMessage[],
  systemPrompt: string,
  modelMeta: ModelMeta | null,
  config?: PruneConfig,
): PruneResult;

Implementation: @kaged/harness pruning.ts.

Disabling Mastra's internal trimming

Mastra v1.x has its own context-management logic. The harness disables it via the Mastra Agent constructor config (passing a messageList that ignores Mastra's MessageListBudgetOptions, or by using the bare LanguageModelV2 interface when feasible — see mastra-adapter.ts). The exact mechanism is implementation detail; the contract is that Mastra must not trim the message list the harness hands to agent.stream(...).

If Mastra cannot be cleanly disabled in some future version, the harness escape hatch is to bypass Agent.stream entirely and call @kaged/llm's streamModel directly for that call (per the existing per-call escape hatch in § Escape hatch).


Streaming relay & daemon integration

The harness exposes one entry point to the daemon: runPrimary. The daemon calls it from handlePostMessage (see http-api.md) as a fire-and-forget operation after the 201 response. The function builds the Mastra Agent, calls agent.stream(...), relays output over the daemon's WebSocket transport, and persists the final AssistantMessage.

Runtime entry point

interface RunPrimaryInput {
  sessionId: string;
  runId: string;
  compiledProject: CompiledProject;
  providerRoute: ProviderRoute;            // resolved from compiled.primary.model + local-config
  messages: Message[];                     // reconstructed from MessageRecord rows
  abortSignal: AbortSignal;
  publish: (event: HarnessOutputEvent) => void;   // harness-owned; daemon translates to WsFrame
  maxSteps?: number;                      // from DSL or session override; passed to Mastra
  maxOutputTokens?: number;              // from DSL or session override; passed to provider
}

interface RunPrimaryResult {
  assistantMessage: AssistantMessage;      // for storage.createMessage
  usage: Usage;
  finishReason: "stop" | "length" | "toolUse" | "error" | "aborted";
}

function runPrimary(input: RunPrimaryInput): Promise<RunPrimaryResult>;

runPrimary is fully async. The session-manager state machine guarantees one in-flight run per session; calling runPrimary twice for the same runId is undefined behavior at the harness layer (the state machine is the gate).

The publish callback receives a harness-owned event type, not a daemon WsFrame. This preserves the ADR-0011 substrate-portability commitment: if Mastra is replaced (the ADR-0012 escape hatch), the harness's outbound contract to the daemon does not change. The daemon owns the WsFrame envelope (channel, seq, transport).

Harness output event shape

type HarnessOutputEvent =
  | {
      type: "message.start";
      runId: string;
      messageId: string;
      provider: string;               // kaged provider name (e.g. "anthropic", "openai")
      model: string;                   // model ID (e.g. "claude-sonnet-4-20250514")
    }
  | { type: "message.delta"; messageId: string; delta: string; kind: "text" | "thinking" }
  | { type: "message.tool_call"; messageId: string; id: string; name: string; arguments: unknown }
  | { type: "message.tool_result"; messageId: string; toolCallId: string; content: string; isError: boolean }
  | {
      type: "message.end";
      messageId: string;
      stopReason: "stop" | "length" | "toolUse" | "error" | "aborted";
      usage?: Usage;
      errorMessage?: string;
      stats?: {
        /** Time to first token in milliseconds. */
        ttft: number | null;
        /** Total generation duration in milliseconds (from stream open to final token). */
        duration: number;
        /** Tokens per second (output tokens / duration). */
        tps: number;
        /** Cost breakdown in USD, computed via @kaged/llm calculateCost. Null when model metadata unavailable. */
        cost: { input: number; output: number; reasoning: number; cacheRead: number; cacheWrite: number; total: number } | null;
      };
    };

The type strings match kaged's WsOutputType 1:1. The daemon wraps each event into a WsFrame on the output channel with monotonically increasing seq.

message.start enrichment. The provider and model fields identify which provider and model are servicing this run. The UI uses these to render a provider:model label on the message bubble header. These values come from the resolved ProviderRoute — they are always available (the harness cannot start a stream without a resolved route).

message.end enrichment. The optional stats object carries post-completion timing and cost data. The harness computes these from its own instrumentation:

  • ttft — elapsed ms from fetch call to first text_delta or thinking_delta event. null if the stream errored before any content token (e.g. immediate 429). Stored on the AssistantMessage as ttft.
  • duration — elapsed ms from stream open to final token (not including post-stream persistence). Stored on the AssistantMessage as duration.
  • tps — output tokens divided by duration in seconds. A derived convenience; the UI displays it directly.
  • cost — dollar cost breakdown from @kaged/llm's calculateCost(usage, modelMeta). null when model metadata is unavailable (unknown model, self-hosted without config). When present, the UI renders the total; the breakdown is available on hover/expand.

stats is absent (not null) when the run terminated before any meaningful measurement (e.g. immediate abort before stream opened). The UI treats missing stats as "no data available."

Stream event mapping

Mastra emits a ReadableStream<ChunkType> (agent.stream(...).fullStream). The harness consumes it, maps each chunk to a HarnessOutputEvent, and calls publish(event). The daemon wraps each event into a WsFrame on the output channel (http-api.md).

Mastra ChunkType.type HarnessOutputEvent.type (same string as WsOutputType) Event payload shape
(start of stream) message.start { runId, messageId, provider, model } (kaged-generated; one per stream; provider+model from resolved route)
text-delta message.delta { messageId, delta, kind: "text" }
reasoning-delta message.delta { messageId, delta, kind: "thinking" } (v0 surfaces as a delta with a kind discriminator)
tool-call message.tool_call { messageId, id, name, arguments }
tool-result message.tool_result { messageId, toolCallId, content, isError }
step-start (suppressed) Not surfaced in v0; logged via observability pipeline only
step-complete (suppressed) Same
source (suppressed) v0 does not surface citations; deferred
file (suppressed) v0 does not surface file outputs; deferred
text-start / text-end (suppressed) Boundaries inferred from the delta stream; not transported in v0
reasoning-start / reasoning-end (suppressed) Same
finish message.end { messageId, stopReason, usage, stats: { ttft, duration, tps, cost } }
error message.end { messageId, stopReason: "error", errorMessage } (terminal; run also transitions to failed; stats may be absent if error occurred before stream opened)

step-start / step-complete chunks are emitted by Mastra for every loop iteration including tool-call continuations. v0 collapses these into a single message envelope because the UI renders one message per run for now. Future versions may expose step boundaries; the protocol slot is available (WsEventType already includes run.started / run.ended; new types may be added).

WebSocket relay topology

The daemon maintains a session → socket registry. The registry is populated when WebSocket connections open and drained when they close.

// In packages/daemon/src/runtime/ws-registry.ts (planned)
const subscribers = new Map<string, Set<ServerWebSocket<WsSessionData>>>();

export function registerSocket(sessionId: string, ws: ServerWebSocket<WsSessionData>): void;
export function unregisterSocket(sessionId: string, ws: ServerWebSocket<WsSessionData>): void;
export function publishHarnessEvent(sessionId: string, event: HarnessOutputEvent): void;

publishHarnessEvent wraps the harness event into a WsFrame (channel output, monotonically increasing seq per subscriber socket), serializes to JSON, and sends to every subscriber. Sessions with no subscribers (operator UI not connected, or connected to a different session) are skipped — the run still completes and persists; only the live relay is dark.

runPrimary's publish callback is a closure that calls publishHarnessEvent(sessionId, event). The harness has no direct reference to ServerWebSocket and no knowledge of the WS frame envelope.

Persistence: AssistantMessageMessageRecord

Mastra's AssistantMessage is rich — text content, thinking content, tool calls, usage, stop reason, latency. kaged's MessageRecord has content: string and metadata: JSON. The mapping:

  • MessageRecord.role = "primary"
  • MessageRecord.content = concatenation of all TextContent blocks (the human-readable transcript)
  • MessageRecord.metadata = JSON encoding of:
    • provider, model
    • usage (input / output / cache / cost)
    • stopReason
    • duration, ttft
    • contentBlocks — the full structured (TextContent | ThinkingContent | ToolCall)[] array, so a future version can reconstruct the message faithfully without re-running the LLM
    • errorMessage, errorStatus if the run terminated in error

The UI's chat transcript renders content (plain text). The metadata.contentBlocks field is for replay, audit-log inspection, and future UI features (collapsed thinking sections, inline tool-call cards).

When MessageRecord is read back for the next run's context (rebuilding Mastra's messages argument), the harness prefers metadata.contentBlocks when present and falls back to wrapping content in a single TextContent block.

Fire-and-forget pattern

handlePostMessage returns 201 immediately after persisting the operator message and starting the run. runPrimary is launched without await. If runPrimary rejects, the harness catches the rejection, publishes a message.end frame with stop_reason: "error", transitions the run to failed, and transitions the session to idle. Unhandled rejection from runPrimary is a kaged bug, not a normal failure mode.


Escape hatch

Per ADR-0012, the escape path if Mastra becomes unsuitable is:

  1. Pin @mastra/core to the last good version.
  2. Evaluate replacement (VoltAgent, hand-rolled harness, or direct @kaged/llm calls without an agent loop).
  3. Replace the DSL compiler behind the stable DSL contract from ADR-0011.

The DSL is the portable artifact. The harness is implementation detail. The operator's project files do not change if the substrate changes.

For specific cases where Mastra's abstractions are insufficient, the harness can call @kaged/llm directly (streamModel / completeModel) without constructing a Mastra Agent. Per ADR-0014, the provider layer is the same in either case — @kaged/llm is both Mastra's model and the per-call escape hatch. The escape hatch is used surgically, not as a pattern.


Failure modes

Failure Detection Recovery Operator impact
Mastra Agent.generate throws Exception in harness Run marked failed. Session → idle. "Agent generation failed: [error]."
Provider rate limited HTTP 429 from provider Provider router tries next fallback. If all exhausted, run fails. "All providers rate-limited. Try again later." (or transparent fallback)
Provider unreachable HTTP timeout / connection refused Provider router tries next fallback. "Provider unreachable." (or transparent fallback)
Processor calls abort() PolicyProcessor detects violation Run marked failed with policy violation details. "Policy violation: [detail]. Check your project config."
Langfuse exporter fails Network error to Langfuse endpoint Exporter logs warning; agent continues. Tracing is best-effort. Warning in logs. No operator-visible impact. Agent continues.
Prompt file not found File read error at compilation or hot-reload Compilation fails with clear error. Agent not started. "Prompt file not found: ./prompts/primary.md"
Prompt file change during generation File watcher fires mid-generation Change queued. Applied at next message boundary, not mid-generation. Seamless. Next generation uses new prompt.
Model alias not found in local config Alias resolution failure Compilation fails with clear error. "Model alias 'fast' not found in local config."
Checkpoint creation fails Storage write error during createCheckpoint Session stays running. Run completes normally without checkpoint. Warning: "Checkpoint could not be saved."
Resume fails (checkpoint not found) getCheckpoint returns null Session stays paused. Operator can retry with valid checkpoint or rollback. "Resume failed: checkpoint not found."
Abort during stream yields no content abortSignal fires before any tokens received Empty partial message persisted. Checkpoint points to empty message. Operator sees empty assistant message. Resume starts fresh generation.
Sub-agent delegation fails onDelegationStart aborts, or sub-agent cage fails to spawn Delegation error returned to supervisor. Supervisor may retry or fail. "Sub-agent [name] could not be started: [reason]."
Context window exceeded Token count > model limit PolicyProcessor truncates or aborts (configurable). "Context window budget exceeded." or automatic truncation.

Excluded Mastra surfaces

Per ADR-0012, the following Mastra features are explicitly not used:

Surface Reason
Mastra Cloud Vendor-hosted. kaged is self-hosted.
Mastra Studio kaged ships its own UI.
Mastra Workspace / Skills Conceptually collides with kaged's [CAGED] sandbox. The SkillsProcessor (which injects <available_skills> XML into system messages) never fires because kaged does not configure a Workspace.
Mastra RAG primitives Not required for v0.

If any of these are adopted in a future version, a follow-up ADR is required.


Testing notes

DSL compilation tests

  • Minimal DSL. Compile a project with one primary agent, no subagents. Assert a valid MastraAgentConfig is produced.
  • Subagents. Compile a project with three subagents. Assert each is registered on the primary's agents config.
  • Cage policies excluded from Mastra. Compile a project with caged subagents. Assert CagePolicy is in the compiled output but not in any Mastra config object.
  • Prompt file resolution. Compile with system_prompt: ./prompts/primary.md. Assert prompt content is read from that file.
  • Model alias resolution. Compile with model: "fast". Assert resolution to the concrete model identifier from local config. Assert error for unknown aliases.
  • Re-compilation. Change the DSL. Assert new compilation produces different agents. Assert active sessions are unaffected.

Processor pipeline tests

  • AuditProcessor logs all messages. Generate a response. Assert the audit log contains the full system + user message array.
  • AuditProcessor logs every step. Generate a multi-step response (tool call + continuation). Assert processInputStep logged for each step with incrementing stepNumber.
  • PolicyProcessor enforces budget. Set a token budget. Send messages exceeding it. Assert abort() is called.
  • PolicyProcessor passes valid requests. Send messages within budget. Assert pass-through.
  • Pipeline order. Assert AuditProcessor fires before PolicyProcessor. Assert PolicyProcessor fires before ObservabilityProcessor.
  • No post-pipeline injection. Capture the message array exiting the pipeline. Capture what the LLM receives (via mock provider). Assert they are identical.

Supervisor tests

  • Child agent delegation. Parent delegates to a child. Assert onDelegationStart fires with the correct tree-position path. Assert the child receives the correct messages.
  • Recursive delegation. Parent delegates to child A, which delegates to grandchild B. Assert both onDelegationStart hooks fire with correct paths ("primary.subagents.A", "primary.subagents.A.subagents.B").
  • Message filter. Configure a message filter that strips messages with content outside the child's cage. Assert filtered messages do not reach the child.
  • Delegation hooks. Assert onDelegationStart and onDelegationComplete fire in order. Assert onDelegationStart can abort a delegation.
  • Per-agent tool resolution. Compile a project with root agent (no tools: override) and a child with "file.*": { enabled: true }. Assert root gets kaged.issue.* and kaged.workflow.* by default. Assert child gets only file.* tools.
  • Depth limit. Compile a project with 17 levels of nesting. Assert parse-time rejection.
  • No sibling dispatch. Assert child A cannot call child B (no agent-B tool registered on A). Only the parent has agent-A and agent-B.

Checkpoint bridge tests

  • Operator pause → checkpoint created. Trigger operator pause via abortController.abort(). Assert runPrimary returns with finishReason: "aborted". Assert daemon creates a CheckpointRecord with createdBy: "operator" and messageCursor pointing to the persisted partial message.
  • Model-initiated checkpoint. Mock a primary that calls the checkpoint tool. Assert runPrimary returns with checkpointRequested populated. Assert daemon creates a CheckpointRecord with createdBy: "model" and the model's reason.
  • Resume reconstructs messages. Create a checkpoint, then resume. Assert a new run is created. Assert runPrimary is called with messages reconstructed from non-superseded MessageRecord rows. Assert CheckpointRecord.resumedAt is updated.
  • Resume with edited prompts. Create a checkpoint, add prompt edits, resume. Assert apply_prompt_edits effect fires before the new run. Assert the resumed runPrimary call uses the updated prompt content.
  • Rollback supersedes messages. Create a checkpoint at message 5, post more messages, then rollback to that checkpoint. Assert messages after the checkpoint's messageCursor are marked superseded = true. Assert CheckpointRecord.rolledBack is set to true. Assert session transitions to idle.
  • Rollback preserves checkpoint. After rollback, assert the target checkpoint record still exists in storage (not deleted).
  • Session state transitions. Assert running → paused on checkpoint. Assert paused → running on resume with pending work. Assert paused → idle on resume without pending work. Assert paused → idle on rollback.
  • Serialization helpers round-trip. snapshotFromMessages()serializeSnapshot()deserializeSnapshot(). Assert the result matches the original snapshot. (These test the retained pure helpers, not the runtime checkpoint path.)

Observability tests

  • Langfuse configured. Set credentials. Assert the daemon initializes the Langfuse client singleton and traces are exported (mock Langfuse endpoint).
  • Langfuse not configured. Omit credentials. Assert no exporter is registered. Assert structured logs are written to stdout.
  • Langfuse failure. Configure Langfuse with an unreachable endpoint. Assert the agent continues normally. Assert a warning is logged.

Provider routing tests

  • Single model. Configure fast = "anthropic:claude-sonnet-4-20250514". Assert the correct provider is called.
  • Fallback chain. Configure smart = ["anthropic:...", "openai:..."]. Mock first provider to return 429. Assert second provider is tried.
  • All providers down. Configure a fallback chain. Mock all providers to fail. Assert the run fails with a clear error.
  • Credential resolution. Configure api_key = "${KAGED_ANTHROPIC_API_KEY}". Set the env var. Assert the key is resolved correctly.

Prompt management tests

  • Hot-reload. Change a prompt file. Assert the agent uses the new content at the next message boundary.
  • No mid-generation reload. Change a prompt file during generation. Assert the current generation completes with the old prompt.
  • Prompt visibility. Fetch prompts via API. Assert the current content matches the file.

Open questions

  1. Token budget strategy. When the context window is exceeded, should the harness truncate (drop oldest messages) or abort? Both are valid; the operator should probably configure the behavior. Provisional: abort by default, truncation opt-in via DSL config.
  2. Processor ordering extensibility. v0 has three hardcoded processors. Should operators be able to register custom processors via the DSL or plugins? Deferred to v0.x.
  3. Observational memory. Mastra has a token-aware truncation and "observational memory" feature. Should kaged use it, or implement its own context window management? Provisional: use Mastra's, but the PolicyProcessor enforces a hard ceiling.
  4. Multi-model per session. Can different runs in the same session use different models (e.g., operator changes the alias mid-session)? Provisional: yes — the provider router resolves aliases per-call, not per-session.
  5. Mastra version pinning strategy. Pin to exact patch (1.2.3) or minor range (^1.2.0)? Exact pin is safest; minor range keeps security patches flowing. Provisional: exact pin in package.json, with a monthly review cadence for bumps.

Amendments

2026-05-23 — Provider strategy via @kaged/llm shim; v0 stateless storage; daemon integration spec

Driven by ADR-0014 and the v0 wiring of handlePostMessage through the harness:

  1. Provider abstraction reversed. Previous wording said the harness "passes the resolved model identifier and credentials to Mastra's Agent config; Mastra handles the rest." Per ADR-0014, all provider calls route through @kaged/llm via its LanguageModelV2 shim. The harness uses kagedModel(route) as Agent.model and does not depend on @ai-sdk/<provider> packages.
  2. Escape hatch corrected. Previous wording named "direct calls to the Vercel AI SDK (generateText / streamText)" as the per-call escape. Per ADR-0014 the escape hatch is direct @kaged/llm calls (streamModel / completeModel). The provider layer is the same in both cases.
  3. Storage strategy added (new § Storage strategy). v0 constructs Mastra Agents stateless; kaged's @kaged/storage is the source of truth. Mastra holds no cross-call state. A future amendment may add a bun:sqlite-backed MemoryStorage adapter if Mastra-side persistence becomes necessary.
  4. Streaming relay added (new § Streaming relay & daemon integration). Defines the runPrimary entry point the daemon calls, the ChunkTypeWsOutputType mapping, the WebSocket session-registry pattern, the AssistantMessageMessageRecord persistence mapping, and the fire-and-forget pattern for handlePostMessage.
  5. ADR-0014 added to the Constrained-by list and constraint table.
  6. Runtime entry-point publish callback re-typed. Earlier draft said publish: (frame: WsFrame) => void, which leaked the daemon's transport envelope into the harness contract. Per ADR-0011, the harness's outbound contract must be substrate-portable. Re-defined as publish: (event: HarnessOutputEvent) => void with a harness-owned event type. The WS-frame envelope (channel, seq, JSON serialization) is the daemon's concern and lives in packages/daemon/src/runtime/ws-registry.ts.

2026-05-24 — Enriched harness output events (provider:model, timing, cost)

Driven by streaming-first enrichment work (ADR-0016) and @kaged/llm model metadata catalog:

  1. message.start enriched with provider and model. The event now carries the resolved provider name and model ID from the ProviderRoute. The UI uses these to render a provider:model label on the message bubble header. These fields are always present — the harness cannot start a stream without a resolved route.
  2. message.end enriched with stats object. A new optional stats field carries post-completion instrumentation: ttft (time to first token, ms), duration (total generation time, ms), tps (tokens per second), and cost (USD breakdown from @kaged/llm's calculateCost). stats is absent when the run terminated before meaningful measurement. cost is null when model metadata is unavailable.
  3. Stream event mapping table updated. message.start row now shows { runId, messageId, provider, model }. finish row now shows { messageId, stopReason, usage, stats }.
  4. Harness instrumentation contract documented. The harness is responsible for capturing TTFT (first content delta timestamp minus stream-open timestamp), duration (stream-open to final token), and calling calculateCost from @kaged/llm with the completed Usage and looked-up ModelMeta. These are harness concerns — the daemon passes them through unchanged.

2026-05-24 — Subagent topology implementation (item 28)

Driven by item 28 (subagent topology in @kaged/harness). Implementation in mastra-adapter.ts, tool-adapter.ts, delegation.ts, runtime.ts.

  1. Supervisor pattern implemented. buildSubagents() in mastra-adapter.ts iterates CompiledProject.subagents, constructs a Mastra Agent per entry with the subagent's description, instructions, and resolved tools, and registers them on the primary's agents config. Matches the § Supervisor pattern / How it works steps 1–4 exactly.
  2. Tool injection via @kaged/agent-tooling. resolveToolsForAgent() calls ToolRegistry.resolve() with the agent's tool globs, then converts each ToolDefinition to a Mastra ToolAction via toolDefinitionsToRecord() (JSON Schema → Zod conversion in tool-adapter.ts). Tools are per-agent, not shared.
  3. Delegation hooks partially implemented. DelegationHooks interface in delegation.ts declares onDelegationStart and onDelegationComplete. buildDelegationConfig() wires audit-log entries (delegation start/complete with subagent key, prompt snippet, timing) and cage-policy lookup from CompiledProject.cagePolicies. The config is an execution option passed to agent.stream(), not a constructor param.
  4. messageFilter deferred. The spec's third hook (messageFilter for cage-aware content stripping) is not yet implemented. It depends on the sandbox runtime (item 30) to define what "outside the cage allowlist" means at runtime.
  5. Cage spawning deferred. § Cage integration steps 3–4 (spawning the sub-agent's process in a cage, tool mediation via ToolPermissions) depend on @kaged/sandbox (item 30). The onDelegationStart hook has a placeholder for cage-not-ready abort but does not perform actual spawning.
  6. SubagentTopologyDeps factored out. Topology concerns (ToolRegistry, delegationHooks, streamDefaults) are separated from AgentFactoryDeps into a dedicated SubagentTopologyDeps interface, keeping the single-primary path unchanged.

2026-05-25 — Checkpoint bridge: v0 kaged-native stateless model (item 32)

Driven by item 32 (checkpoint bridge implementation) and design analysis of Mastra Workflow suspend()/resume() vs kaged-native checkpoints:

  1. Mastra Workflow suspend/resume removed from v0 checkpoint bridge. Previous wording described the checkpoint bridge as a translation layer between Mastra's suspend(payload) / run.resume({ step, resumeData }) and kaged's checkpoint protocol. v0 Agents are constructed stateless (§ Storage strategy) — Mastra Workflow suspend requires Mastra-side snapshot persistence via a StorageAdapter that kaged does not provide. The bridge now operates entirely within kaged's storage layer.
  2. Checkpoint is a message-cursor pointer, not a snapshot blob. Previous wording described checkpoints as serialized MessageList states. v0 checkpoints store only a messageCursor (the MessageRecord.id at pause time). Full state is reconstructible from MessageRecord rows up to the cursor — the same reconstructMessages() path used by every run.
  3. Two trigger mechanisms defined. Operator pause uses abortController.abort() → partial message persisted → CheckpointRecord created. Model-initiated checkpoint uses a checkpoint tool → tool sets checkpointRequested flag → generation completes naturally → runPrimary returns with checkpointRequested → daemon creates CheckpointRecord.
  4. RunPrimaryResult extended with checkpointRequested. Optional field that signals the daemon to create a checkpoint and transition to paused instead of idle.
  5. Resume is a new run, not a continuation. Previous wording said resume "continues from the suspended point" via Mastra's run.resume(). v0 resume creates a new run, reconstructs messages from non-superseded rows, and calls runPrimary — the model continues naturally from full context.
  6. Checkpoint bridge test notes rewritten. Tests now verify kaged-native checkpoint creation, message cursor accuracy, resume message reconstruction, rollback message superseding, and session state transitions — not Mastra suspend/resume mechanics.
  7. Failure modes updated. Replaced Mastra-specific suspend() / resume() validation errors with kaged-native failure modes (storage write errors, checkpoint not found, abort before content).
  8. Pure serialization helpers retained. checkpoint-bridge.ts functions (suspend(), resume(), serializeSnapshot(), deserializeSnapshot(), snapshotFromMessages()) are kept as utilities for a future Mastra-side persistence path. They are not used in the v0 runtime checkpoint flow.

2026-06-03 — Agent execution limits and stop reason surface

  1. RunPrimaryInput extended. Two new optional fields: maxSteps (integer) and maxOutputTokens (integer). Sourced from the DSL's AgentSpec.max_steps / AgentSpec.max_output_tokens or from a session-level UI override.
  2. agent.stream() execution options updated. The harness passes maxSteps and maxTokens (mapped to maxOutputTokens) to Mastra's agent.stream() call. This replaces the previous default behavior where Mastra used maxSteps = 5 and the provider used its own opaque default.
  3. Stop reason already surfaced. The message.end event already carries stopReason ("stop" | "length" | "toolUse" | "error" | "aborted"). The UI uses this to display why a run ended and to offer an auto-continue action when stopReason === "length".
  4. Escape hatch preserved. If the operator does not set max_steps or max_output_tokens, the harness passes nothing to Mastra, preserving the previous default behavior. This is the migration path for existing projects.
  5. Test notes added. Tests verify: maxSteps reaches Mastra, maxOutputTokens reaches the provider request body, missing fields result in no parameter being passed, and stop reason "length" triggers the auto-continue affordance in the UI.

2026-05-27 — ADR-0023 & ADR-0024: plugin hook firing + context compaction

Per ADR-0023 and ADR-0024:

  1. New § Plugin hook firing. The harness is the firing point for project-plugin lifecycle hooks. on_session_start and on_session_idle fire only on plugins declared on the primary (sessions are primary-owned). pre_compact and post_compact fire per-agent. Firing order, context population, and failure handling specified.
  2. on_session_start is once-per-session. Tracked via session.recalled_at. The harness fires the hook in reconstructMessages() on the first run of the session, appends <plugin:NAME>...</plugin:NAME>-wrapped inject content to the system prompt, and records the firing.
  3. on_session_idle fires from the session manager, not runPrimary. Idle detection runs after the last run completes and the session has been idle for the debounce window (default 30s; per-plugin configurable). Pending timers are not restored across daemon restart.
  4. New § Compaction. Context compaction is kaged-owned at the reconstructMessages() boundary. Mastra's internal trimming is neutralized — the harness never lets the message list exceed thresholds, so Mastra has nothing to trim.
  5. Compaction pipeline defined. Token estimation pre-call → observer hooks fire → strategy applies (drop/summarize/delegate/checkpoint) → mark superseded → persist CompactionRecord → hand compacted list to agent.stream().
  6. Hard threshold with hysteresis. Default upper threshold 0.85; lower 0.60. Compaction runs until below the lower threshold to avoid oscillation. Both thresholds operator-configurable per agent.
  7. Reactive fallback. If the estimate is wrong and the provider returns a context-length error, the harness compacts reactively and retries once. Second failure marks the run failed with context_overflow.
  8. compactor plugin role. Plugins declared with role: compactor receive pre_compact with role: compactor in params and return a CompactorResult that replaces the strategy step. The harness validates the result (always-keep preserved, tool pairs intact, valid superseded subset); invalid results fall back to drop.
  9. Always-keep set defined. System prompt, first operator message, messages with metadata.always_keep = true, and operator-configured predicates. The compaction pipeline filters always-keep out of the candidate set before any strategy runs.
  10. Tool-call/tool-result pair atomicity. Coupled pairs are the compaction unit. Drop/summarize/delegate strategies treat them as one logical message; splitting is a contract violation.
  11. CompactionRecord shape specified. Persisted in @kaged/storage (see session-manager.md § CompactionRecord). Includes operator-feedback fields (operatorFlag, operatorNotes) per ADR-0024.
  12. Dry-run path. The compaction pipeline runs without committing; observer hooks see dry_run: true in context and skip side effects. Used by the Compactor UI for strategy preview and by manual-compact for the commit flow.
  13. Cost surfacing. Summarizer model calls produce a compactor_cost field on RunPrimaryResult.stats and on the summary message's metadata. Session aggregate cost view surfaces primary: $X, compactor: $Y.
  14. New audit events. compaction.triggered, compaction.completed, compaction.failed, compaction.flagged. Plugin-host's hook-firing events (per plugin-host.md) capture the per-plugin half.
  15. Mastra internal trimming disabled. Configured via Mastra Agent constructor; escape hatch is the existing per-call bypass to streamModel. The contract: Mastra never trims the message list the harness handed it.
  16. Constraints table updated. Six new constraints added covering hook firing, compaction ownership, per-agent semantics, compactor failure fallback, atomic tool pairs, and between-call execution.

2026-05-26 — ADR-0022: recursive agent tree, per-agent tool resolution, no can_be_called_by/interconnect

Per ADR-0022:

  1. CompiledProject restructured. Previous shape had a flat primary: MastraAgentConfig and subagents: Record<string, MastraAgentConfig> alongside cagePolicies: Record<string, CagePolicy>. New shape uses a recursive CompiledAgentNode carrying its own config, cagePolicy, tools, and children. CompiledProject.root is the tree root (primary agent). There is no separate subagents or cagePolicies map — everything is in the tree.
  2. Supervisor pattern rewritten for recursive tree walk. The compiler walks the AgentSpec tree depth-first via buildAgentNode(). Every agent with children is a Mastra supervisor over its direct children. The tree structure is the call graph — no can_be_called_by checks, no event-routed dispatch (interconnect). Depth bounded at 16 levels.
  3. Per-agent tool resolution documented. Each agent's tool set is resolved independently via resolveToolsForAgent() at compile time. Resolution chain: built-in registry → role-based defaults (root gets kaged.issue.*/kaged.workflow.*) → agent's tools: block → principal_scope enforcement → cage filter at dispatch. Cross-references agent-tooling.md § Per-agent tool resolution.
  4. DSL compilation example updated. Example now shows per-agent tools: and cage: on both root and child agents instead of project-level tool declarations.
  5. Cage integration updated. Root agent has cage: disabled (interim state). Children with cage: disabled run in daemon process context; children with full cage block are spawned in cages. Previous wording said "the supervisor (primary) runs uncaged; only sub-agents are caged" — replaced with cage-per-agent semantics.
  6. Supervisor tests expanded. Added recursive delegation test, per-agent tool resolution test, depth limit test, and no-sibling-dispatch test.
  7. ADR-0022 added to constrained-by list.

References