· Engineering · 5 min read
DeepSeek V4 API: Flash and Pro Models for Ultra-Fast AI Integration
DeepSeek V4 Flash and Pro model features, pricing, 384K context window, and OpenAI/Anthropic compatible API integration. Agent tools and practical usage examples.

DeepSeek introduced two new models with the V4 series: deepseek-v4-flash and deepseek-v4-pro. In this article, I explain the features, pricing, and usage of these models.

1. DeepSeek V4 Models
The DeepSeek V4 series retires the old model names (deepseek-chat and deepseek-reasoner) as of July 24, 2026. The new model family is as follows:
| Model | Description | Use Case |
|---|---|---|
deepseek-v4-flash | Fast, cost-effective model | General chat, code generation, quick responses |
deepseek-v4-pro | Premium high-quality model | Complex reasoning, analysis, professional use |
The old model names (deepseek-chat and deepseek-reasoner) will continue working for backward compatibility for now — deepseek-chat corresponds to deepseek-v4-flash’s non-thinking mode; deepseek-reasoner corresponds to thinking mode.
⚠️
deepseek-chatanddeepseek-reasonerwill be deprecated on July 24, 2026. It’s recommended to usedeepseek-v4-flashordeepseek-v4-prodirectly in new projects.
2. Pricing
The DeepSeek V4 series is priced in Chinese yuan (¥). Approximate USD equivalents are below (1 USD ≈ 7.2 ¥):
deepseek-v4-flash
| Metric | ¥ (CNY) | ~$ (USD) |
|---|---|---|
| 1M Input Token (Cache Hit) | ¥1 | ~$0.14 |
| 1M Input Token (Cache Miss) | ¥2 | ~$0.28 |
| 1M Output Token | ¥12 | ~$1.74 |
deepseek-v4-pro
| Metric | ¥ (CNY) | ~$ (USD) |
|---|---|---|
| 1M Input Token (Cache Hit) | ¥2 | ~$0.28 |
| 1M Input Token (Cache Miss) | ¥20 | ~$2.80 |
| 1M Output Token | ¥25 | ~$3.48 |
🎉 deepseek-v4-pro currently has a 75% discount! The discount is valid until May 31, 2026. So the Pro model’s output price is currently ~$0.87/M token.
Cost Comparison
| 1000 calls (500 input + 500 output token) | Flash | Pro (discounted) |
|---|---|---|
| Cost | ~$0.001 | ~$0.002 |
| 1M output token | Flash | Pro (discounted) | GPT-4o |
|---|---|---|---|
| Cost | ~$1.74 | ~$0.87 | ~$10.00 |
3. Context Window
Both models offer 384K token context window. This means giving a ~300-page book to the model in one go. Comparatively:
| Model | Context Window |
|---|---|
| DeepSeek V4 Flash/Pro | 384K |
| GPT-4o | 128K |
| Claude Opus 4 | 200K |
| Gemini 1.5 Pro | 2M |
With 384K context, you can send an entire codebase, long documents, or comprehensive analyses to the model in one go.
Context Caching
DeepSeek supports context caching (KV Cache). By caching frequently used prompts, you can reduce costs by up to 10x:
- Normal input: $0.28/M token (Flash)
- Cache hit: $0.14/M token (Flash) — 50% savings
As of April 28, 2026, the cache hit price was reduced to 1/10 of the launch price.
4. API Usage
The DeepSeek API is fully compatible with OpenAI and Anthropic formats. You can switch to DeepSeek without any code changes using your existing OpenAI SDK.
Basic Parameters
| Parameter | Value |
|---|---|
| base_url (OpenAI) | https://api.deepseek.com |
| base_url (Anthropic) | https://api.deepseek.com/anthropic |
| api_key | Obtained from platform.deepseek.com |
| model | deepseek-v4-flash or deepseek-v4-pro |
Test with curl
curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
-d '{
"model": "deepseek-v4-pro",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about DeepSeek V4."}
],
"thinking": {"type": "enabled"},
"reasoning_effort": "high",
"stream": false
}'
Usage with Python
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the difference between DeepSeek V4 Flash and Pro?"}
],
stream=False,
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}}
)
print(response.choices[0].message.content)
Usage with Node.js
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://api.deepseek.com',
apiKey: process.env.DEEPSEEK_API_KEY,
});
async function main() {
const completion = await openai.chat.completions.create({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Describe DeepSeek V4 features.' },
],
model: 'deepseek-v4-pro',
thinking: { type: 'enabled' },
reasoning_effort: 'high',
stream: false,
});
console.log(completion.choices[0].message.content);
}
main();
5. Thinking Mode
One of DeepSeek V4’s most powerful features is Thinking Mode. The model thinks step by step before responding and makes this thinking process visible.
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Analyze an AI agent ecosystem."}],
extra_body={"thinking": {"type": "enabled"}},
reasoning_effort="high" # low, medium, high
)
You can control the thinking depth with the reasoning_effort parameter:
low— for quick answersmedium— balancedhigh— deep analysis
6. Integration with Agent Tools
DeepSeek V4 works directly with popular AI agent and coding tools. Since the API is fully compatible with the OpenAI format, you can use it as a backend model in tools like Claude Code, GitHub Copilot, and OpenCode.
DeepSeek with Claude Code
# Set DeepSeek as backend in Claude Code
# Settings → Provider → DeepSeek V4
# or via CLI:
DeepSeek with OpenCode
OpenCode users can similarly switch to DeepSeek V4 by changing baseURL and apiKey settings.
OpenAI Compatible SDK
No need to change your SDK. Just update the base_url and api_key parameters:
# Switch from OpenAI to DeepSeek — only 2 lines change
client = OpenAI(
api_key="sk-deepseek-...", # DeepSeek API key
base_url="https://api.deepseek.com" # New base URL
)
7. Flash or Pro?
| Criteria | deepseek-v4-flash | deepseek-v4-pro |
|---|---|---|
| Speed | ⚡ Very fast | 🚀 Fast |
| Quality | High | Highest |
| Price | 💰 Economical | 💎 Premium |
| Context | 384K | 384K |
| Thinking | Yes | Yes |
| Usage | Daily chat, code, simple analysis | Complex reasoning, research, professional reports |
When to use which model?
- Use Flash: Daily development, code completion, fast prototyping, simple queries
- Use Pro: Deep analysis, complex reasoning, long context tasks, production-quality output
💡 Tip: You can use both models in the same project. Optimize costs by using Flash for simple tasks and Pro for critical work.
8. Rate Limit and Error Codes
DeepSeek API’s rate limit policy varies based on your usage level. Visit the DeepSeek API Docs for detailed information.
Common error codes:
| Code | Meaning |
|---|---|
| 401 | Invalid API key |
| 429 | Rate limit exceeded |
| 500 | Server error |
| 503 | Service temporarily unavailable |
9. Conclusion
DeepSeek V4 is a model family that stands out especially for its price/performance ratio:
- ✅ 384K context surpasses competitors
- ✅ OpenAI/Anthropic compatible API with zero integration cost
- ✅ Thinking Mode for in-depth analysis
- ✅ Context Caching for up to 50% cost savings
- ✅ Full compatibility with agent tools (Claude Code, Copilot, OpenCode)
- ✅ 75% discount makes Pro model very affordable right now
Old model names will be deprecated on July 24, 2026. I recommend using deepseek-v4-flash and deepseek-v4-pro directly in your new projects.
10. Resources
- DeepSeek API Docs
- DeepSeek Platform
- Getting API Key
- Models & Pricing
- Thinking Mode Guide
- Agent Integrations Guide
- GitHub: DeepSeek
Hero image: generated with fal.ai + FLUX.1 Dev



