Live in production · 220+ models supported

Stop AI cost explosions before they happen

TokenCapAI enforces hard spend caps on every LLM call. Set a budget per agent. Block calls when the limit is hit. Get alerted in real time.

TokenCapAI overview dashboard showing today's spend, cap utilisation, and per-agent breakdown across multiple LLM-powered agents

Every AI team eventually gets hit by one of these

The runaway loop

An agent calls GPT-4o 10,000 times because of a bug. $800 gone in 2 minutes.

The silent spike

A new feature ships, usage spikes overnight. You find out when the invoice arrives.

No visibility

Multiple agents running, no idea which one is eating the budget.

Every call, every decision, logged

Full audit trail: timestamp, agent, model, tokens, cost, status, and stop reason for every event.

TokenCapAI event log showing a mix of allowed and blocked LLM calls across Support_RAG_Agent, Code_Review_Bot, Data_Pipeline_Agent, and Onboarding_Assistant, with per-event cost and stop reasons

Two ways to integrate

Monitoring mode — any language, no lock-in. Proxy mode — one line of config.

Option 1 — Monitoring: check, call, report

1

Check before calling

Ask TokenCapAI if the agent is allowed to spend.

const status = await fetch(
  `${TOKENCAP_API}/v1/status?agent_id=${AGENT_ID}`,
  { headers: { Authorization: `Bearer ${API_KEY}` } }
).then(r => r.json());

if (!status.allowed) throw new Error('Cap exceeded');
2

Make the LLM call

Nothing changes. Call OpenAI, Anthropic, or any provider as normal.

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
});
3

Report what was spent

Pass the raw response — TokenCapAI extracts tokens and calculates cost.

await fetch(`${TOKENCAP_API}/v1/events`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${API_KEY}` },
  body: JSON.stringify({ agent_id: AGENT_ID, model: 'gpt-4o', response }),
});

Option 2 — Proxy: one config change

Store your provider key in TokenCapAI (encrypted). Change your SDK's base URL. Caps enforced automatically — no check or report calls needed.

# OpenAI
client = openai.OpenAI(
  base_url="https://api.tokencapai.com/proxy/openai/v1",
  api_key=TOKENCAP_KEY,
  default_headers={"X-TokenCapAI-Agent-Id": AGENT_ID},
)
# Call client.chat.completions.create() as normal — done

# Anthropic
client = anthropic.Anthropic(
  base_url="https://api.tokencapai.com/proxy/anthropic/v1",
  api_key="not-used",
  default_headers={
    "Authorization": "Bearer " + TOKENCAP_KEY,
    "X-TokenCapAI-Agent-Id": AGENT_ID,
  },
)

Everything you need to control AI spend

Hard enforcement

Calls are blocked before they happen — not just flagged after the invoice arrives.

Transparent proxy

Point any OpenAI or Anthropic SDK at our proxy URL. Caps enforced with one line of config. Prompts never stored.

Per-agent budgets

Set different limits for different agents. Your chatbot gets $10/day. Your pipeline gets $50/month.

Loop protection

Velocity caps catch infinite loops by rate-limiting calls per minute before they cost you.

220+ models

Built-in pricing for OpenAI, Anthropic, Gemini, Mistral, Cohere, and DeepSeek. Updated weekly.

Instant alerts

Slack, webhook, or email when a cap is hit or approaching 80%. Fires in real time.

Full audit log

Every allowed and blocked call recorded. Filter by agent, model, or date.

Simple pricing

Start free. Upgrade when you need more.

Free

£0forever
  • 1 agent
  • 1 seat
  • 7-day event log
  • Daily & monthly caps
Request access

Starter

£25per month
  • 10 agents
  • 5 seats
  • 30-day event log
  • All cap types
  • Slack alerts
Request access

Growth

£52per month
  • 50 agents
  • 20 seats
  • 90-day event log
  • All cap types
  • All alert types
  • Loop protection
Request access

Enterprise

From £300per month
  • Custom agents & seats
  • Unlimited event log
  • All cap types
  • All alert types
  • Loop protection
  • Priority support
Request access

Apply for early access

Free during the beta. No credit card. Works with OpenAI, Anthropic, Gemini, and any other LLM provider.

Request access