Token Pricing / Guides

What real system prompts cost

The leaked Claude.ai and ChatGPT system prompts are between 4,000 and 24,000 tokens. Here's what they cost per call across every flagship model, with and without caching — and the three changes that drop a typical system prompt's cost by 90%.

Run your own system prompt through the calculator →

How long are real system prompts?

Most teams underestimate how big production system prompts get. Snapshot of publicly leaked / disclosed system prompts as of April 2026:

ProductSystem prompt tokens
ChatGPT (GPT-4o, with all tools)4,523
Claude.ai (Claude Opus 4.x)24,134
Cursor IDE (default agent)9,847
v0 by Vercel6,201
Bolt.new3,475

Tokens are measured in cl100k_base approximation. For Claude 4.x the actual API count is ~35% higher (different tokenizer). The 24K-token Claude.ai system prompt becomes ~32K on the wire.

Per-call cost: ChatGPT-style 4.5K-token system prompt

Most agentic apps end up around 4–6K tokens once you've added a meaningful set of tools. Cost per request (system prompt only, every turn, no caching):

Model Cost per call 100K calls/month
GPT-4o Mini $0.00068 $67.84
GPT-4.1 $0.0090 $904.60
GPT-5 $0.0057 $565.38
Claude Haiku 4.5 $0.0061 $610.61
Claude Sonnet 4.6 $0.0183 $1832
Claude Opus 4.7 $0.0305 $3053

Claude Sonnet 4.6 with this system prompt at 100K calls/month: $1832/month in system-prompt overhead alone. That's before any user input, output tokens, tool definitions beyond the system prompt, retries, or anything else.

Per-call cost: Claude.ai-style 24K-token system prompt

At the high end (Claude.ai's full leaked prompt at 24K tokens, billed as ~32.5K via Anthropic's tokenizer):

Model Cost per call 100K calls/month
GPT-4o Mini $0.0036 $362.01
GPT-5 $0.0302 $3017
Claude Sonnet 4.6 $0.0977 $9774
Claude Opus 4.7 $0.163 $16290

At Claude Opus 4.7 with the full Claude.ai system prompt: $16290/month in system-prompt overhead alone for 100K calls. This is why Anthropic's caching exists.

The caching difference

Cache the system prompt and the math changes dramatically. With Anthropic's 5-minute cache (cached reads at 10% of base price):

Scenario Per call 100K calls/month
Sonnet 4.6, 4.5K prompt, no caching $0.0183 $1832
Sonnet 4.6, 4.5K prompt, cached (5m) $0.0018 $183.18
Savings $1649/month saved
Sonnet 4.6, 24K prompt, no caching $0.0977 $9774
Sonnet 4.6, 24K prompt, cached (5m) $0.0098 $977.43
Savings $8797/month saved

For a Claude.ai-scale system prompt at 100K calls/month, caching saves ~$880/month per user-bound session. Real Claude.ai is doing many millions of calls per day; their savings from caching are in the millions per month.

Where the tokens go in a real system prompt

Breakdown of a typical 5K-token agentic system prompt:

SectionTokens% of total
Identity / role / persona~1503%
Behavior rules / safety guidelines~60012%
Tool definitions (schemas + descriptions)~2,80056%
Few-shot examples~1,00020%
Output formatting rules~4008%
Date / context injection~501%

Tools dominate. A 10-tool agent setup easily adds 3–5K tokens just describing what each tool does and what parameters it accepts. This is also why removing unused tools is one of the higher-leverage optimizations.

Three changes that drop system-prompt cost ~90%

  1. Cache it. Anthropic 5-minute cache: 90% off on all reads after the first. OpenAI: automatic 50% off. This alone is the difference between a $135/month system-prompt bill and a $13/month one.
  2. Move dynamic content out. Date stamps, request IDs, user-specific context — all of these invalidate the cache if they're in the system prompt. Move them into the user message instead.
  3. Drop unused tools. Each tool definition is 50–500 tokens. If you have 15 tools but most calls only use 3, consider routing to a smaller tool subset based on the user's intent before invoking the model.

Real leaked system prompts to study

These are publicly accessible via the jujumilk3/leaked-system-prompts and asgeirtj/system_prompts_leaks GitHub repositories — useful for understanding what production systems actually look like:

  • ChatGPT (OpenAI) — 4.5K tokens; tool definitions for DALL-E, code interpreter, web browsing
  • Claude.ai (Anthropic) — 24K tokens; very detailed behavior + tool descriptions
  • Cursor — 9.8K tokens; code-editing agent with file/terminal/search tools
  • v0 (Vercel) — 6.2K tokens; React component generation with strict output formatting
  • Bolt.new — 3.5K tokens; lightest of the major coding agents
  • Perplexity, Microsoft Copilot, Devin, GitHub Copilot — all leaked, all worth reading for prompt-engineering reference

Reading these is the fastest education in production prompt engineering. The patterns repeat across products — once you've seen 3 of them you've seen the template.

FAQ

How is a system prompt billed?

Same as user input — every token in the system prompt is counted as input tokens on every API call. There's no special 'system prompt' price; the system role just tells the model how to behave. For an agent making 100 calls per session, a 5,000-token system prompt costs you 500,000 input tokens of overhead.

Does caching the system prompt help?

Massively. On Anthropic, caching a 5,000-token system prompt drops the cost from $0.015/call to $0.0015/call after the first call (90% off cached reads, 5-min TTL). On OpenAI, automatic prefix caching gives 50% off after the first call within ~5-10 minutes. The system prompt is the single best caching target in any production app — see /prompt-caching.

Why are real system prompts so long?

Three reasons: (1) tool definitions — every tool the model can call must be described in the prompt, with name, description, and parameter schema; (2) safety / behavior rules accumulate over time as edge cases emerge; (3) examples — many production prompts include 5-20 few-shot examples to constrain output style. Cursor, v0, and Bolt's prompts are mostly tool definitions and behavior rules.

Can I just trim the system prompt to save money?

Sometimes. Common wins: remove unused tools (each saves 50-200 tokens), consolidate behavior rules, drop few-shot examples that the model handles fine without. But before trimming, set up caching first — it's typically a 90% cost reduction, while trimming is usually 10-20%. Cache first, optimize second.

Is the leaked Claude.ai system prompt really 24K tokens?

Yes. The April 2025 leak shows Claude Opus 4's full Claude.ai system prompt at ~24,000 tokens — a substantial portion of which is detailed tool descriptions for web search, computer use, code execution, and artifacts. Anthropic confirmed the leak's authenticity in a Mar 2025 blog post acknowledging public visibility into system prompts.

Calculate your system prompt's cost →