Token Pricing / Use cases

Cheapest LLM for chatbots

Per-user-per-month cost for a real chatbot workload, across every major model. Includes the caching effect (which is huge for chatbots) and tokenizer overhead. Updated April 2026.

Adjust the assumptions in the calculator →

The workload we're costing

Calculations below assume a typical chatbot use case based on public production data (Discord bots, customer-support copilots, consumer chatbots):

VariableValue
Messages per active user per day5
System prompt size1,500 tokens
Conversation history (avg, in window)1,000 tokens
User input per message80 tokens
Response per message300 tokens
Days/month per active user30

Adjust these in the calculator for your specific workload — most apps' values are within 30% of these defaults.

Cost ranking, with and without caching

Sorted by per-user-per-month cost with system prompt cached (which is what you should actually do):

Model Uncached Cached 10K users/mo
Ministral 3B $0.0173 $0.0173 $172.80
GPT-4.1 Nano $0.0283 $0.0227 $227.25
Llama 3.1 8B Instant (Groq) $0.0229 $0.0229 $229.50
GPT-5 Nano $0.0373 $0.0272 $272.25
Gemini 2.5 Flash-Lite $0.0567 $0.0398 $398.25
DeepSeek V4 Flash $0.0668 $0.0416 $415.80
Mistral Small 3 $0.0522 $0.0522 $522.00
GPT-4o Mini $0.0851 $0.0682 $681.75
GPT-5 Mini $0.0934 $0.0709 $708.75
Gemini 2.5 Flash $0.229 $0.178 $1780
Llama 3.3 70B (Groq) $0.264 $0.264 $2639
Claude Haiku 4.5 $0.826 $0.553 $5528

Numbers in cents/dollars per user per month. The "10K users/mo" column is what you'd pay total for a chatbot serving 10,000 active users monthly at this workload.

Verdict by scale

Solo project / under 1,000 users

Pick on capability, not cost — the bill is too small to matter. At 1,000 users/month a Claude Sonnet 4.6 chatbot with caching runs ~$1658.48/month total. Use the model your evals say is best.

1,000–10,000 users

Cost starts to matter, but quality still matters more. Sweet spot: Gemini 2.5 Flash-Lite, GPT-5 Nano, or Claude Haiku 4.5. All three handle a routine chatbot well, all three are caching-friendly, all three keep monthly bills well under $1,000.

10,000–100,000 users

Cost dominates. Optimize hard: cache aggressively, cap output length, route obvious queries to small models, escalate only complex queries to the bigger ones. Top picks: GPT-5 Nano ($2723/mo for 100K users) or Gemini Flash-Lite.

100,000+ users

At this scale every cost decision compounds. Consider:

The caching effect (why "cheapest" depends on what you cache)

For chatbots specifically, caching is the single biggest cost lever. The 1.5K-token system prompt is identical for every conversation, which means it's the perfect caching target. Anthropic gives 90% off cached reads; OpenAI gives 50% automatically.

Without caching, Claude Haiku 4.5 with the workload above costs about $0.826/user/month. With Anthropic 5-min caching: $0.553/user/month — a 33% cost reduction. Caching deep dive →

Common mistakes that blow up chatbot costs

  1. Sending entire conversation history every turn unbounded. After 50 messages, you're sending a 10K-token prompt on every reply. Set a sliding-window limit (last 10 messages, or last 5K tokens of history).
  2. Not caching the system prompt. See above — single biggest cost lever.
  3. Using a flagship model for FAQ-style queries. 90% of customer-support chatbot queries are routine. Route them to a small model and escalate the 10% that need it.
  4. No output length cap. Without max_tokens, a verbose model can write 4,000 tokens when you wanted 200. Set the cap.
  5. No daily/monthly per-user quota. A handful of users can run up your bill. Cap at e.g. 100 messages/day per user.

Ranking summary (cached cost, 5 messages/day workload)

The clear winners for chatbot use cases at scale:

  1. Ministral 3B — $0.0173/user/month
  2. GPT-4.1 Nano — $0.0227/user/month
  3. Llama 3.1 8B Instant (Groq) — $0.0229/user/month
  4. GPT-5 Nano — $0.0272/user/month
  5. Gemini 2.5 Flash-Lite — $0.0398/user/month

Don't pick on price alone — run a quality eval on your specific chatbot's expected conversations before committing. Even a 10% quality drop in a customer-support context costs more in escalations than it saves in API fees.

FAQ

What's the absolute cheapest LLM for a chatbot in 2026?

Per-user-per-month for a typical chatbot workload (5 messages/day, system prompt cached): GPT-5 Nano and GPT-4.1 Nano are tied at the bottom of the OpenAI family. Mistral's Ministral 3B undercuts them on raw token cost but lacks caching. For most apps, Gemini 2.5 Flash-Lite or DeepSeek V4 Flash hit the sweet spot of cheap + capable + cached.

How much does ChatGPT cost per user per month if I built it?

Depending on model: roughly $0.05–$5/user/month for active users sending ~5 messages/day. Free OpenAI ChatGPT users likely cost OpenAI ~$0.20–$1.00/month (heavily subsidized by Plus subscribers). Self-built with GPT-5 Nano + caching: ~$0.05/user. Self-built with Claude Sonnet 4.6 + caching: ~$1.50/user.

Why does cached pricing matter so much for chatbots?

Chatbots have very stable system prompts (the personality, behavior rules, tool definitions don't change between users). With caching, that 1.5K-token system prompt costs 10% of normal price after the first call. Multiply across millions of conversations and it's the difference between profitable and unprofitable.

Can I run a chatbot on Llama 3.1 8B for cheaper?

Yes, on Groq it's ~$0.05/M input and $0.08/M output — among the cheapest options. Quality is meaningfully lower than GPT-5 Mini or Gemini Flash, but for narrow chatbots (FAQ answering, simple lookup), the gap may be acceptable. Always run a quality eval on your actual conversations before deciding on cost grounds.

How do I budget for unexpected user behavior?

Three buffers: (1) cap conversation length — drop oldest messages once context exceeds 5K tokens; (2) hard token limit on output (max_tokens=500 for chat); (3) per-user daily quota — most users send 0-5 messages/day; outliers send 200+. A 100-message/day cap protects you from the long tail.

Run your own chatbot numbers →