What actually drives your LLM bill
For any sufficiently busy product, three things dominate cost. The calculator above shows you the first; the rest of this site explains the others:
- Per-token price × tokens. Obvious, but the difference between Claude Haiku 4.5 and Claude Opus 4.7 is 15× — for many tasks Haiku is good enough and the savings are real. Full price table →
- Prompt caching. The biggest lever you're probably ignoring. Re-sending the same system prompt or document context drops to ~10% of normal price after the first call (Anthropic), or 50% off automatically (OpenAI). Get this right and your bill can drop 60-80%. Caching guide →
- Output tokens are 3-5× input tokens. Forcing structured short outputs (JSON with strict schema, single-line summaries) saves real money. Reasoning models (o3, R1) hide additional output tokens you still pay for.
Topics
- Claude vs GPT (April 2026) — tokenizer-adjusted cost comparison, intelligence/speed/context breakdown, verdict by use case.
- 8 hidden costs of LLM apps — reasoning tokens, retries, tokenizer overhead, cache write surcharge, structured output inflation, and 4 others.
- What real system prompts cost — measured cost of the leaked Claude.ai (24K tokens) and ChatGPT (4.5K tokens) system prompts, with caching math.
- Cheapest LLM for chatbots — per-user-per-month cost across every model for a typical chatbot workload, sorted ascending.
- Pricing changes — May 2026 — Claude Opus down to $5/$25, GPT-5 family launched, o3 cut 87%, DeepSeek rebranded to V4, and more.
- Full pricing table — every major model's input, output, and cached prices. Sortable. Updated when providers change rates.
- Prompt caching deep dive — how Anthropic, OpenAI, and Google's caching works, when it pays off, and the traps that make caching cost more than it saves.
Coming soon
- /context-window — current window sizes, the lost-in-the-middle problem, cost per 100K of context.
- /system-prompts — token cost, caching strategies, length-vs-quality tradeoffs.
- /skills-token-cost — Claude Skills overhead and when they pay off.
- /mcp-token-cost — every MCP tool definition costs tokens, how much, how to budget.
FAQ
How accurate are the token counts?
OpenAI models use the exact tiktoken o200k_base tokenizer — counts match the real API. Other providers (Claude, Gemini, Llama, etc.) ship proprietary tokenizers we approximate with cl100k_base; counts are typically within 5%. For Anthropic exact counts you can call client.messages.countTokens() in the Anthropic SDK.
Why are reasoning models (o3, R1) not 'cheaper' the way they look?
Their output cost includes hidden reasoning tokens you never see — typically 4-10× the visible response. The calculator shows the visible-output cost; multiply by ~5-8 for a realistic estimate of o3 or R1 actual output billing.
What's the difference between input cost and cached input?
When you re-send the same context (system prompt, document, tool definitions) within minutes/hours, providers charge ~10% of normal input price for the cached portion. For high-traffic apps, prompt caching is the biggest single cost lever — see the caching guide.
Why is Gemini's price doubling above 200K tokens?
Google charges a premium for prompts longer than 200K tokens — $2.50/M input and $10/M output instead of $1.25/$5. The calculator uses the under-200K tier; if your prompt is longer, double those numbers.
Are these prices current?
Each model has a last_verified date in /pricing. Provider pricing changes every few months — always confirm against the provider's pricing page before signing a long-term commitment.