AI agent cost calculator
Stop guessing at surprise bills from Claude Code, Codex, Cursor, LangGraph, CrewAI and background agents. Model your real workflow — steps, context growth, retries and failures — and get cost per task, cost per successful task, monthly spend, a waste estimate, and a cheaper-routing suggestion.
Switching from Claude Opus 4.8 to DeepSeek-V4 Pro (reasoner) (DeepSeek, same frontier tier) cuts monthly spend by $6.6K — 92% — for this workload.
Where each task's cost goes
| Model | $/task | $/successful | Monthly | vs current |
|---|---|---|---|---|
CHEAPEST OpenAI · fast | $0.0508 | $0.0591 | $76.18 | cheapest |
| Groq · fast | $0.054 | $0.0627 | $80.94 | 1.1× |
| Gemini · fast | $0.0928 | $0.1079 | $139 | 1.8× |
| OpenRouter · mid | $0.1114 | $0.1296 | $167 | 2.2× |
| DeepSeek · mid | $0.1213 | $0.141 | $182 | 2.4× |
| OpenRouter · mid | $0.2264 | $0.2632 | $340 | 4.5× |
| OpenAI · mid | $0.2539 | $0.2953 | $381 | 5.0× |
| Gemini · mid | $0.3069 | $0.3569 | $460 | 6.0× |
| Together · mid | $0.3622 | $0.4211 | $543 | 7.1× |
| DeepSeek · frontier | $0.3756 | $0.4368 | $563 | 7.4× |
| Groq · mid | $0.6334 | $0.7365 | $950 | 12.5× |
| Anthropic · mid | $0.95 | $1.10 | $1.4K | 18.7× |
| Together · mid | $1.11 | $1.29 | $1.7K | 21.8× |
| OpenAI · frontier | $1.27 | $1.48 | $1.9K | 25.0× |
| Gemini · frontier | $1.27 | $1.48 | $1.9K | 25.0× |
| OpenAI · frontier | $1.92 | $2.24 | $2.9K | 37.9× |
| Anthropic · frontier | $2.85 | $3.31 | $4.3K | 56.1× |
CURRENT Anthropic · frontier | $4.75 | $5.52 | $7.1K | 93.5× |
Per-1M-token list prices; figures are estimates and exclude embeddings, fine-tuning, image/audio, and infrastructure. Rows marked verifystill need a final check against the provider's live pricing. LLM prices change often.
How the estimate works
The agent loop
One task = one end-to-end agent run. The model is called model callstimes; on each turn the context is the stable prefix (system prompt + tool schemas) plus everything accumulated from prior turns (outputs + tool results) plus any retrieved context. We sum the input and output cost across all turns at the selected model's per-token list price.
Prompt caching
With caching on, the stable prefix bills at the model's cache-read rate from the second turn onward (≈0.1x input on Claude). Variable context always bills at full price. Models with no prompt cache (e.g. Groq) ignore the toggle.
Waste
Retries repeat calls at current context; timeout loops burn full-context calls; hallucinated tool calls waste a tool result plus a correction call; handoffs re-send the whole context to another agent. We add these to the attempt cost and report the share as "waste".
Success rate & routing
Failed tasks re-run, so cost per successful task = attempt cost ÷ success rate. The routing suggestion finds the cheapest model of at leastthe same capability tier — it won't tell you to run a frontier coding agent on a nano model.
Figures are estimates from public list prices and exclude embeddings, fine-tuning, image/audio tokens, and self-hosting infrastructure. LLM prices change often — verify before relying on a number. Providers covered: Anthropic, OpenAI, Gemini, DeepSeek, Groq, Together, OpenRouter.
Frequently asked questions
How do I estimate the cost of an AI agent?+
Cost is driven by tokens, and agents burn tokens in a loop: the model is called many times per task, and the context grows on every turn as prior outputs and tool results pile up and get re-sent. So the formula isn't 'tokens × price' once — it's roughly (number of calls) × (average context size) × input price + output tokens × output price, plus the waste from retries and failures. This calculator models that loop directly so you don't have to.
Why are AI agents so much more expensive than a single chat call?+
Context accumulation. A 30-turn agent re-sends its early context up to 30 times, so a few thousand tokens of system prompt and tool schemas can be billed dozens of times in one task. Multi-agent handoffs, retries, and stuck loops multiply that further. The biggest surprise on most agent bills is paying repeatedly for the same context.
What is 'cost per successful task' and why does it matter?+
It's your total spend divided by the tasks that actually succeeded. If some tasks fail and must re-run, you paid for those failed attempts too — so cost per successful task is higher than cost per attempt. It's the honest number for budgeting, and the calculator surfaces it alongside a monthly waste estimate.
Does prompt caching reduce agent costs?+
Significantly, for the right shape. Caching bills your stable prefix (system prompt + tool schemas) at the cache-read rate — about 0.1x input on Claude — instead of full price on every turn. The catch: any byte change in the prefix invalidates the cache, so timestamps or varying tool lists silently kill your hit rate. Toggle caching in the calculator to see your savings.
How can I make my agent cheaper without losing quality?+
Four levers, in rough order of impact: cut context growth (trim system prompts, prune tool results, summarize old turns); enable and protect prompt caching; reduce waste (lower retries, cap loops, validate tool calls); and route appropriately — a frontier model for hard steps, a cheaper same-tier or lower-tier model for the rest. The comparison table and routing suggestion quantify the last one for your workload.
Which provider is cheapest for AI agents?+
It depends entirely on the workload shape. Frontier reasoning favors different models than high-volume tool loops. The calculator prices your exact profile across OpenAI, Anthropic, Google, DeepSeek, Groq, Together and OpenRouter at once, so you can compare like-for-like instead of comparing headline per-token prices that ignore context growth.