← back to status

@kaged/llm

Pure-fetch LLM provider interface supporting Anthropic, OpenAI, Google, and Antigravity API shapes with SSE streaming, cost calculation, and model discovery

source files

test files

~5.2k

lines

✓ 110 pass

tests

pass

typecheck

clean

lint

Test results 110

✓ estimateTokens — algorithm selection > uses tiktoken for anthropic models [170.46ms]

✓ estimateTokens — algorithm selection > uses tiktoken for openai models [0.160ms]

✓ estimateTokens — algorithm selection > uses fallback for google/gemini models [0.130ms]

✓ estimateTokens — algorithm selection > uses fallback for groq models [0.030ms]

✓ estimateTokens — algorithm selection > uses fallback when modelMeta is null [0.020ms]

✓ estimateTokens — reservedOutputTokens > defaults to 4096 [0.130ms]

✓ estimateTokens — reservedOutputTokens > echoes custom value [0.100ms]

✓ estimateTokens — reservedOutputTokens > totalTokens = inputTokens + reservedOutputTokens [0.160ms]

✓ estimateTokens — context window and fraction > contextWindow comes from modelMeta.maxInputTokens [0.120ms]

✓ estimateTokens — context window and fraction > contextWindow is null when modelMeta is null [0.030ms]

✓ estimateTokens — context window and fraction > fraction is totalTokens / contextWindow [0.090ms]

✓ estimateTokens — context window and fraction > fraction uses FALLBACK_CONTEXT_WINDOW when modelMeta is null [0.040ms]

✓ estimateTokens — token counting > inputTokens increases with more messages [0.200ms]

✓ estimateTokens — token counting > inputTokens increases with longer messages [0.140ms]

✓ estimateTokens — token counting > inputTokens increases with longer system prompt [0.260ms]

✓ estimateTokens — token counting > system prompt as array is joined and counted [0.150ms]

✓ estimateTokens — token counting > empty messages and empty system prompt produces minimal tokens [0.050ms]

✓ estimateTokens — message types > counts system messages [0.080ms]

✓ estimateTokens — message types > counts tool result messages [0.140ms]

✓ estimateTokens — message types > counts assistant tool call messages [0.110ms]

✓ estimateTokens — message types > counts thinking content in assistant messages [0.130ms]

✓ estimateTokens — message types > counts multimodal user messages with images [0.140ms]

✓ estimateTokens — conservative estimation > estimate is non-zero for any non-empty input [0.110ms]

✓ estimateTokens — conservative estimation > tiktoken and fallback both produce positive counts for same input [0.150ms]

✓ ModelMeta — tokenizer field > anthropic models have tiktoken tokenizer [0.050ms]

✓ ModelMeta — tokenizer field > openai models have tiktoken tokenizer [0.030ms]

✓ ModelMeta — tokenizer field > ollama models have unknown tokenizer [0.020ms]

✓ estimateTokens — large message lists > handles 100 messages without error [2.90ms]

✓ estimateTokens — plugin memory in system prompt > counts plugin-wrapped content in system prompt [0.580ms]

✓ lookupModelMeta — catalog key format > resolves anthropic models via provider/modelId key [0.040ms]

✓ lookupModelMeta — catalog key format > resolves openai models via provider/modelId key [0.030ms]

✓ lookupModelMeta — catalog key format > resolves ollama models via provider/modelId key [0.030ms]

✓ lookupModelMeta — catalog key format > returns null for unknown model [0.020ms]

✓ lookupModelMeta — catalog key format > returns null for unknown provider [0.010ms]

✓ lookupModelMeta — capabilities > claude-sonnet-4 reports reasoning and vision [0.030ms]

✓ lookupModelMeta — capabilities > claude-haiku-3.5 does not report reasoning [0.020ms]

✓ lookupModelMeta — capabilities > gpt-5 reports reasoning [0.020ms]

✓ lookupModelMeta — pricing > extracts non-zero pricing for paid models [0.030ms]

✓ lookupModelMeta — pricing > reasoning pricing is non-null for models that have it [0.030ms]

✓ lookupModelMeta — pricing > reasoning pricing is null for models without it [0.020ms]

✓ lookupModelMeta — pricing > ollama models have zero pricing [0.030ms]

✓ lookupModelMeta — pricing > extracts token limits [0.020ms]

✓ calculateCost > zero usage produces zero cost [0.110ms]

✓ calculateCost > calculates input and output cost [0.060ms]

✓ calculateCost > uses output rate for reasoning when reasoning rate is null [0.050ms]

✓ calculateCost > uses dedicated reasoning rate when present [0.040ms]

✓ calculateCost > returns all zeros when meta is null [0.030ms]

✓ resolveModelMeta > no overrides returns default meta with all sources as default [0.290ms]

✓ resolveModelMeta > override takes precedence over catalog default [0.150ms]

✓ resolveModelMeta > override on context window [0.060ms]

✓ resolveModelMeta > builds meta from overrides only when model not in catalog [0.110ms]

✓ classifyRetry > overloaded provider message is transient provider_error, retryable [0.280ms]

✓ classifyRetry > 429 rate limit is retryable [0.040ms]

✓ classifyRetry > 503 is retryable provider_error [0.020ms]

✓ classifyRetry > network errors are retryable [0.050ms]

✓ classifyRetry > context overflow flag forces context_too_long, not retryable [0.040ms]

✓ classifyRetry > auth failures are not retryable [0.030ms]

✓ classifyRetry > spend limit is not retryable [0.020ms]

✓ classifyRetry > unknown errors default to run_failed, not retryable [0.020ms]

✓ classifyRetry > Retry-After ms yields absolute retryAfterUntil [0.030ms]

✓ classifyRetry > Retry-After seconds converted to ms [0.020ms]

✓ classifyRetry > rate limit with long minutes cooldown produces far-future retryAfterUntil [0.030ms]

✓ loadCatalog > loads a CatalogSnapshot from a JSON object [0.070ms]

✓ loadCatalog > parses providers keyed by canonical name [0.050ms]

✓ loadCatalog > parses models keyed by canonical provider/modelId [0.030ms]

✓ loadCatalog > preserves the provider npm package name [0.030ms]

✓ loadCatalog > preserves per-model npm override when present [0.030ms]

✓ loadCatalog > caches after first call (same reference returned for same input) [0.020ms]

✓ loadCatalog > rejects a snapshot with missing schemaVersion [0.090ms]

✓ loadCatalog > rejects a snapshot with no providers [0.050ms]

✓ listProviders > returns all providers in the snapshot [0.070ms]

✓ listProviders > each provider entry carries npm package and api baseURL [0.050ms]

✓ listModels > returns all models when no provider filter [0.050ms]

✓ listModels > filters by provider when providerName is given [0.060ms]

✓ listModels > returns empty array for unknown provider filter [0.020ms]

✓ lookupProvider > returns the provider entry by name [0.030ms]

✓ lookupProvider > returns undefined for unknown provider [0.020ms]

✓ lookupModel > returns the model entry by provider + modelId [0.030ms]

✓ lookupModel > returns undefined for unknown model [0.020ms]

✓ lookupModel > returns undefined when provider is unknown [0.020ms]

✓ resolvePackageName > returns the provider's npm package when no overrides and no per-model npm [0.060ms]

✓ resolvePackageName > returns the catalog's per-model npm override when present [0.030ms]

✓ resolvePackageName > operator packageOverride wins over catalog per-model npm [0.050ms]

✓ resolvePackageName > operator packageOverride wins over catalog provider npm [0.030ms]

✓ resolvePackageName > unrelated overrides do not affect the resolved package [0.030ms]

✓ resolveModel > throws driver_not_bundled when the resolved package is not a bundled driver [0.440ms]

✓ resolveModel > aborts when signal is already triggered [0.140ms]

✓ resolveModel > looks the resolved package up in the bundled-driver registry and instantiates it [0.240ms]

✓ resolveModel > honors operator packageOverride over catalog npm when selecting the driver [0.150ms]

✓ resolveModel > custom provider rides a bundled driver via route.npmPackage + baseUrl [0.120ms]

✓ resolveModel > wraps the resolved model with middleware before returning [0.100ms]

✓ wrapWithRetry > returns the result on first success without retry [0.960ms]

✓ wrapWithRetry > retries on 429 and eventually succeeds [5.00ms]

✓ wrapWithRetry > does not retry on 400 (client error) [0.180ms]

✓ wrapWithRetry > retries on 500 (server error) [1.19ms]

✓ wrapWithRetry > retries on network errors (ECONNREFUSED) [1.19ms]

✓ wrapWithRetry > stops retrying after maxAttempts and surfaces the final error [2.25ms]

✓ wrapWithRetry > abort signal prevents further retries [0.150ms]

✓ wrapWithRetry > honors Retry-After header when present (delays longer than base) [2.29ms]

✓ wrapWithRetry > respects maxDelayMs cap [101.61ms]

✓ wrapWithRetry: prototype-chain preservation > preserves provider and modelId when provider is a prototype getter [0.350ms]

✓ wrapWithRetry: prototype-chain preservation > retry still routes through the original prototype doStream [3.60ms]

✓ wrapWithSpendGate > proceeds when no limits are configured [0.470ms]

✓ wrapWithSpendGate > proceeds when spend is under configured limits [0.180ms]

✓ wrapWithSpendGate > blocks when 5h spend limit is exceeded [0.360ms]

✓ wrapWithSpendGate > blocks when 7d spend limit is exceeded [0.180ms]

✓ wrapWithSpendGate > blocks at the limit boundary (currentSpend === limit) [0.250ms]

✓ wrapWithSpendGate > checks spend before each call, not just once [0.310ms]

✓ wrapWithSpendGate > error message identifies which limit was exceeded and includes current spend [0.290ms]

✓ wrapWithSpendGate: prototype-chain preservation > preserves provider and modelId when provider is a prototype getter [0.290ms]

Mentioned in

Type	Document
adr	ADR-0013: Observability substrate is Langfuse, self-hosted, optional
adr	ADR-0014: All LLM providers route through @kaged/llm; Mastra integrates via a LanguageModelV2 shim
adr	ADR-0024: Context compaction is kaged-owned, layered, observable, and operator-tunable
adr	ADR-0026: Cost management, model metadata overrides, and provider usage tracking
adr	ADR-0028: 3rd-party OAuth provider auth — token lifecycle and credential management
adr	ADR-0049: Providers are dynamically-loaded modules from an operator-local store; catalog from a self-hosted models.dev mirror; @kaged/llm becomes resolver + loader + middleware
adr	ADR-0052: Message regeneration and two-tier, operator-cancellable retry
spec	Spec: Agent Harness
spec	Spec: HTTP + WebSocket API
spec	Spec: LLM Provider Interface
spec	Spec: Local config