DeepSeek V4 API: Flash and Pro Models for Ultra-Fast AI Integration

DeepSeek introduced two new models with the V4 series: deepseek-v4-flash and deepseek-v4-pro. In this article, I explain the features, pricing, and usage of these models.

DeepSeek V4 API

1. DeepSeek V4 Models

The DeepSeek V4 series retires the old model names (deepseek-chat and deepseek-reasoner) as of July 24, 2026. The new model family is as follows:

Model	Description	Use Case
`deepseek-v4-flash`	Fast, cost-effective model	General chat, code generation, quick responses
`deepseek-v4-pro`	Premium high-quality model	Complex reasoning, analysis, professional use

The old model names (deepseek-chat and deepseek-reasoner) will continue working for backward compatibility for now — deepseek-chat corresponds to deepseek-v4-flash’s non-thinking mode; deepseek-reasoner corresponds to thinking mode.

⚠️ deepseek-chat and deepseek-reasoner will be deprecated on July 24, 2026. It’s recommended to use deepseek-v4-flash or deepseek-v4-pro directly in new projects.

2. Pricing

The DeepSeek V4 series is priced in Chinese yuan (¥). Approximate USD equivalents are below (1 USD ≈ 7.2 ¥):

deepseek-v4-flash

Metric	¥ (CNY)	~$ (USD)
1M Input Token (Cache Hit)	¥1	~$0.14
1M Input Token (Cache Miss)	¥2	~$0.28
1M Output Token	¥12	~$1.74

deepseek-v4-pro

Metric	¥ (CNY)	~$ (USD)
1M Input Token (Cache Hit)	¥2	~$0.28
1M Input Token (Cache Miss)	¥20	~$2.80
1M Output Token	¥25	~$3.48

🎉 deepseek-v4-pro currently has a 75% discount! The discount is valid until May 31, 2026. So the Pro model’s output price is currently ~$0.87/M token.

Cost Comparison

1000 calls (500 input + 500 output token)	Flash	Pro (discounted)
Cost	~$0.001	~$0.002

1M output token	Flash	Pro (discounted)	GPT-4o
Cost	~$1.74	~$0.87	~$10.00

3. Context Window

Both models offer 384K token context window. This means giving a ~300-page book to the model in one go. Comparatively:

Model	Context Window
DeepSeek V4 Flash/Pro	384K
GPT-4o	128K
Claude Opus 4	200K
Gemini 1.5 Pro	2M

With 384K context, you can send an entire codebase, long documents, or comprehensive analyses to the model in one go.

Context Caching

DeepSeek supports context caching (KV Cache). By caching frequently used prompts, you can reduce costs by up to 10x:

Normal input: $0.28/M token (Flash)
Cache hit: $0.14/M token (Flash) — 50% savings

As of April 28, 2026, the cache hit price was reduced to 1/10 of the launch price.

4. API Usage

The DeepSeek API is fully compatible with OpenAI and Anthropic formats. You can switch to DeepSeek without any code changes using your existing OpenAI SDK.

Basic Parameters

Parameter	Value
base_url (OpenAI)	`https://api.deepseek.com`
base_url (Anthropic)	`https://api.deepseek.com/anthropic`
api_key	Obtained from platform.deepseek.com
model	`deepseek-v4-flash` or `deepseek-v4-pro`

Test with curl

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about DeepSeek V4."}
        ],
        "thinking": {"type": "enabled"},
        "reasoning_effort": "high",
        "stream": false
      }'

Usage with Python

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the difference between DeepSeek V4 Flash and Pro?"}
    ],
    stream=False,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}}
)

print(response.choices[0].message.content)

Usage with Node.js

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://api.deepseek.com',
  apiKey: process.env.DEEPSEEK_API_KEY,
});

async function main() {
  const completion = await openai.chat.completions.create({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Describe DeepSeek V4 features.' },
    ],
    model: 'deepseek-v4-pro',
    thinking: { type: 'enabled' },
    reasoning_effort: 'high',
    stream: false,
  });

  console.log(completion.choices[0].message.content);
}

main();

5. Thinking Mode

One of DeepSeek V4’s most powerful features is Thinking Mode. The model thinks step by step before responding and makes this thinking process visible.

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Analyze an AI agent ecosystem."}],
    extra_body={"thinking": {"type": "enabled"}},
    reasoning_effort="high"  # low, medium, high
)

You can control the thinking depth with the reasoning_effort parameter:

low — for quick answers
medium — balanced
high — deep analysis

6. Integration with Agent Tools

DeepSeek V4 works directly with popular AI agent and coding tools. Since the API is fully compatible with the OpenAI format, you can use it as a backend model in tools like Claude Code, GitHub Copilot, and OpenCode.

DeepSeek with Claude Code

# Set DeepSeek as backend in Claude Code
# Settings → Provider → DeepSeek V4
# or via CLI:

DeepSeek with OpenCode

OpenCode users can similarly switch to DeepSeek V4 by changing baseURL and apiKey settings.

OpenAI Compatible SDK

No need to change your SDK. Just update the base_url and api_key parameters:

# Switch from OpenAI to DeepSeek — only 2 lines change
client = OpenAI(
    api_key="sk-deepseek-...",         # DeepSeek API key
    base_url="https://api.deepseek.com"  # New base URL
)

7. Flash or Pro?

Criteria	deepseek-v4-flash	deepseek-v4-pro
Speed	⚡ Very fast	🚀 Fast
Quality	High	Highest
Price	💰 Economical	💎 Premium
Context	384K	384K
Thinking	Yes	Yes
Usage	Daily chat, code, simple analysis	Complex reasoning, research, professional reports

When to use which model?

Use Flash: Daily development, code completion, fast prototyping, simple queries
Use Pro: Deep analysis, complex reasoning, long context tasks, production-quality output

💡 Tip: You can use both models in the same project. Optimize costs by using Flash for simple tasks and Pro for critical work.

8. Rate Limit and Error Codes

DeepSeek API’s rate limit policy varies based on your usage level. Visit the DeepSeek API Docs for detailed information.

Common error codes:

Code	Meaning
401	Invalid API key
429	Rate limit exceeded
500	Server error
503	Service temporarily unavailable

9. Conclusion

DeepSeek V4 is a model family that stands out especially for its price/performance ratio:

✅ 384K context surpasses competitors
✅ OpenAI/Anthropic compatible API with zero integration cost
✅ Thinking Mode for in-depth analysis
✅ Context Caching for up to 50% cost savings
✅ Full compatibility with agent tools (Claude Code, Copilot, OpenCode)
✅ 75% discount makes Pro model very affordable right now

Old model names will be deprecated on July 24, 2026. I recommend using deepseek-v4-flash and deepseek-v4-pro directly in your new projects.

10. Resources

Hero image: generated with fal.ai + FLUX.1 Dev