Token Pricing / DeepSeek

DeepSeek API pricing (2026)

Two models — V4 Flash and V4 Pro — at price points that repeatedly land near the bottom of any LLM ranking. Below: the rates, the catch, and head-to-head comparison with Claude Haiku, GPT-5 Mini, and Gemini Flash Lite.

Estimate cost with your workload →

The rates

Model	Input $/MTok	Output $/MTok	Cached read	Context
DeepSeek V4 Flash	$0.140	$0.280	$0.0280	1000K
DeepSeek V4 Pro (reasoning)	$1.740	$3.480	$0.145	1000K

Cached reads (where the same prefix has already been processed) are typically 80%+ off list. How prompt caching works →

Off-peak discount

DeepSeek operates a daily off-peak window (UTC 16:30 – 00:30, which is 00:30 – 08:30 China Standard Time) with a ~50% discount on most rates. Batch jobs and async workloads are the obvious fit; live chat usually isn't.

Stacked with prompt caching, off-peak inference can drop effective input cost below $0.05 per million tokens for V4 Flash — roughly an order of magnitude cheaper than equivalent Western providers.

Head-to-head: V4 Flash vs cheap competitors

Model	Input	Output	Blended	Hosted in
DeepSeek V4 Flash	$0.140	$0.280	$0.210	China (or Together / DeepInfra mirrors)
Claude Haiku 4.5	$1.000	$5.000	$3.000	US
GPT-5 Mini	$0.125	$1.000	$0.563	US
Gemini 2.5 Flash-Lite	$0.100	$0.400	$0.250	Google global

Where DeepSeek wins

Bulk text classification, summarization, and translation — high token volumes where 5–10x cost savings compound.
Async batch jobs: nightly enrichment, data cleaning, document processing.
Mixed Chinese / English workloads — the tokenizer's Chinese efficiency is a real second-order saving.
Cost-sensitive prototyping before committing to a provider.

Where it's the wrong call

Strict data residency (HIPAA, GDPR + EU-only, regulated finance). Use the open-weight V4 models on a US/EU-hosted provider, or pick a different vendor.
Function calling / structured output for production agents. The first-party API supports it; quality is a tier behind Claude / GPT for complex tool chains.
Latency-sensitive UX. China-hosted endpoints add round-trip latency for users in the Americas / Europe.
Long-form reasoning over very long context. DeepSeek's 128K window matches GPT-5 but trails Claude's 1M and Gemini's 1M+ for code-base or research-paper workloads.

Open-weight alternative paths

DeepSeek V4 weights are open. Same architecture is available on:

Together AI — US-hosted, slightly higher per-token cost, full OpenAI-compatible API.
DeepInfra — similar.
Fireworks AI — pricing competitive with Together for V4.
Self-hosted on H100s — only economical above ~2B tokens/day.

These trade some of the cost advantage for residency and latency guarantees that the first-party API doesn't provide.

FAQ

Why is DeepSeek cheaper than US-hosted models?

Three reasons: a Mixture-of-Experts architecture that activates only ~37B of its ~670B parameters per token (so per-token compute matches a much smaller model), aggressive Chinese inference economics, and a focus on raw cost-per-token rather than the polished SDK / tooling overhead of US providers.

Are there off-peak or batch discounts?

Yes. DeepSeek runs an off-peak window (UTC 16:30–00:30) with ~50% discount on most rates. Batch jobs (24-hour SLA) are also discounted. Both stack with prompt caching.

How does the tokenizer compare?

DeepSeek uses its own BPE tokenizer optimized for Chinese + English. For English text it averages roughly the same token-count as GPT's o200k_base. For Chinese it's significantly more efficient. Plug your own text into the /tokens-per-word logic for an estimate.

What are the data and residency considerations?

DeepSeek's API is hosted in China by default, which has data-residency, export-control, and compliance implications for many companies. Some customers route via Together AI, Fireworks, or DeepInfra hosting their open-weight V4 models — same architecture, US/EU residency, slightly different price.

Is there a free tier?

DeepSeek runs a free tier with generous limits for testing — typically 10K tokens/min and a low daily cap. Paid tier requires payment in CNY or via supported intermediaries.

Open the calculator →