Token Pricing / DeepSeek

DeepSeek API pricing (2026)

Two models — V4 Flash and V4 Pro — at price points that repeatedly land near the bottom of any LLM ranking. Below: the rates, the catch, and head-to-head comparison with Claude Haiku, GPT-5 Mini, and Gemini Flash Lite.

Estimate cost with your workload →

The rates

Model Input $/MTok Output $/MTok Cached read Context
DeepSeek V4 Flash $0.140 $0.280 $0.0280 1000K
DeepSeek V4 Pro (reasoning) $1.740 $3.480 $0.145 1000K

Cached reads (where the same prefix has already been processed) are typically 80%+ off list. How prompt caching works →

Off-peak discount

DeepSeek operates a daily off-peak window (UTC 16:30 – 00:30, which is 00:30 – 08:30 China Standard Time) with a ~50% discount on most rates. Batch jobs and async workloads are the obvious fit; live chat usually isn't.

Stacked with prompt caching, off-peak inference can drop effective input cost below $0.05 per million tokens for V4 Flash — roughly an order of magnitude cheaper than equivalent Western providers.

Head-to-head: V4 Flash vs cheap competitors

Model Input Output Blended Hosted in
DeepSeek V4 Flash $0.140 $0.280 $0.210 China (or Together / DeepInfra mirrors)
Claude Haiku 4.5 $1.000 $5.000 $3.000 US
GPT-5 Mini $0.125 $1.000 $0.563 US
Gemini 2.5 Flash-Lite $0.100 $0.400 $0.250 Google global

Where DeepSeek wins

Where it's the wrong call

Open-weight alternative paths

DeepSeek V4 weights are open. Same architecture is available on:

These trade some of the cost advantage for residency and latency guarantees that the first-party API doesn't provide.

Related

FAQ

Why is DeepSeek cheaper than US-hosted models?

Three reasons: a Mixture-of-Experts architecture that activates only ~37B of its ~670B parameters per token (so per-token compute matches a much smaller model), aggressive Chinese inference economics, and a focus on raw cost-per-token rather than the polished SDK / tooling overhead of US providers.

Are there off-peak or batch discounts?

Yes. DeepSeek runs an off-peak window (UTC 16:30–00:30) with ~50% discount on most rates. Batch jobs (24-hour SLA) are also discounted. Both stack with prompt caching.

How does the tokenizer compare?

DeepSeek uses its own BPE tokenizer optimized for Chinese + English. For English text it averages roughly the same token-count as GPT's o200k_base. For Chinese it's significantly more efficient. Plug your own text into the /tokens-per-word logic for an estimate.

What are the data and residency considerations?

DeepSeek's API is hosted in China by default, which has data-residency, export-control, and compliance implications for many companies. Some customers route via Together AI, Fireworks, or DeepInfra hosting their open-weight V4 models — same architecture, US/EU residency, slightly different price.

Is there a free tier?

DeepSeek runs a free tier with generous limits for testing — typically 10K tokens/min and a low daily cap. Paid tier requires payment in CNY or via supported intermediaries.

Open the calculator →