· Engineering  · 5 min read

DeepSeek V4 API: Flash and Pro Models for Ultra-Fast AI Integration

DeepSeek V4 Flash and Pro model features, pricing, 384K context window, and OpenAI/Anthropic compatible API integration. Agent tools and practical usage examples.

DeepSeek V4 Flash and Pro model features, pricing, 384K context window, and OpenAI/Anthropic compatible API integration. Agent tools and practical usage examples.

DeepSeek introduced two new models with the V4 series: deepseek-v4-flash and deepseek-v4-pro. In this article, I explain the features, pricing, and usage of these models.

DeepSeek V4 API


1. DeepSeek V4 Models

The DeepSeek V4 series retires the old model names (deepseek-chat and deepseek-reasoner) as of July 24, 2026. The new model family is as follows:

ModelDescriptionUse Case
deepseek-v4-flashFast, cost-effective modelGeneral chat, code generation, quick responses
deepseek-v4-proPremium high-quality modelComplex reasoning, analysis, professional use

The old model names (deepseek-chat and deepseek-reasoner) will continue working for backward compatibility for now — deepseek-chat corresponds to deepseek-v4-flash’s non-thinking mode; deepseek-reasoner corresponds to thinking mode.

⚠️ deepseek-chat and deepseek-reasoner will be deprecated on July 24, 2026. It’s recommended to use deepseek-v4-flash or deepseek-v4-pro directly in new projects.


2. Pricing

The DeepSeek V4 series is priced in Chinese yuan (¥). Approximate USD equivalents are below (1 USD ≈ 7.2 ¥):

deepseek-v4-flash

Metric¥ (CNY)~$ (USD)
1M Input Token (Cache Hit)¥1~$0.14
1M Input Token (Cache Miss)¥2~$0.28
1M Output Token¥12~$1.74

deepseek-v4-pro

Metric¥ (CNY)~$ (USD)
1M Input Token (Cache Hit)¥2~$0.28
1M Input Token (Cache Miss)¥20~$2.80
1M Output Token¥25~$3.48

🎉 deepseek-v4-pro currently has a 75% discount! The discount is valid until May 31, 2026. So the Pro model’s output price is currently ~$0.87/M token.

Cost Comparison

1000 calls (500 input + 500 output token)FlashPro (discounted)
Cost~$0.001~$0.002
1M output tokenFlashPro (discounted)GPT-4o
Cost~$1.74~$0.87~$10.00

3. Context Window

Both models offer 384K token context window. This means giving a ~300-page book to the model in one go. Comparatively:

ModelContext Window
DeepSeek V4 Flash/Pro384K
GPT-4o128K
Claude Opus 4200K
Gemini 1.5 Pro2M

With 384K context, you can send an entire codebase, long documents, or comprehensive analyses to the model in one go.

Context Caching

DeepSeek supports context caching (KV Cache). By caching frequently used prompts, you can reduce costs by up to 10x:

  • Normal input: $0.28/M token (Flash)
  • Cache hit: $0.14/M token (Flash) — 50% savings

As of April 28, 2026, the cache hit price was reduced to 1/10 of the launch price.


4. API Usage

The DeepSeek API is fully compatible with OpenAI and Anthropic formats. You can switch to DeepSeek without any code changes using your existing OpenAI SDK.

Basic Parameters

ParameterValue
base_url (OpenAI)https://api.deepseek.com
base_url (Anthropic)https://api.deepseek.com/anthropic
api_keyObtained from platform.deepseek.com
modeldeepseek-v4-flash or deepseek-v4-pro

Test with curl

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-v4-pro",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Tell me about DeepSeek V4."}
        ],
        "thinking": {"type": "enabled"},
        "reasoning_effort": "high",
        "stream": false
      }'

Usage with Python

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the difference between DeepSeek V4 Flash and Pro?"}
    ],
    stream=False,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}}
)

print(response.choices[0].message.content)

Usage with Node.js

import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://api.deepseek.com',
  apiKey: process.env.DEEPSEEK_API_KEY,
});

async function main() {
  const completion = await openai.chat.completions.create({
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Describe DeepSeek V4 features.' },
    ],
    model: 'deepseek-v4-pro',
    thinking: { type: 'enabled' },
    reasoning_effort: 'high',
    stream: false,
  });

  console.log(completion.choices[0].message.content);
}

main();

5. Thinking Mode

One of DeepSeek V4’s most powerful features is Thinking Mode. The model thinks step by step before responding and makes this thinking process visible.

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Analyze an AI agent ecosystem."}],
    extra_body={"thinking": {"type": "enabled"}},
    reasoning_effort="high"  # low, medium, high
)

You can control the thinking depth with the reasoning_effort parameter:

  • low — for quick answers
  • medium — balanced
  • high — deep analysis

6. Integration with Agent Tools

DeepSeek V4 works directly with popular AI agent and coding tools. Since the API is fully compatible with the OpenAI format, you can use it as a backend model in tools like Claude Code, GitHub Copilot, and OpenCode.

DeepSeek with Claude Code

# Set DeepSeek as backend in Claude Code
# Settings → Provider → DeepSeek V4
# or via CLI:

DeepSeek with OpenCode

OpenCode users can similarly switch to DeepSeek V4 by changing baseURL and apiKey settings.

OpenAI Compatible SDK

No need to change your SDK. Just update the base_url and api_key parameters:

# Switch from OpenAI to DeepSeek — only 2 lines change
client = OpenAI(
    api_key="sk-deepseek-...",         # DeepSeek API key
    base_url="https://api.deepseek.com"  # New base URL
)

7. Flash or Pro?

Criteriadeepseek-v4-flashdeepseek-v4-pro
Speed⚡ Very fast🚀 Fast
QualityHighHighest
Price💰 Economical💎 Premium
Context384K384K
ThinkingYesYes
UsageDaily chat, code, simple analysisComplex reasoning, research, professional reports

When to use which model?

  • Use Flash: Daily development, code completion, fast prototyping, simple queries
  • Use Pro: Deep analysis, complex reasoning, long context tasks, production-quality output

💡 Tip: You can use both models in the same project. Optimize costs by using Flash for simple tasks and Pro for critical work.


8. Rate Limit and Error Codes

DeepSeek API’s rate limit policy varies based on your usage level. Visit the DeepSeek API Docs for detailed information.

Common error codes:

CodeMeaning
401Invalid API key
429Rate limit exceeded
500Server error
503Service temporarily unavailable

9. Conclusion

DeepSeek V4 is a model family that stands out especially for its price/performance ratio:

  • 384K context surpasses competitors
  • OpenAI/Anthropic compatible API with zero integration cost
  • Thinking Mode for in-depth analysis
  • Context Caching for up to 50% cost savings
  • Full compatibility with agent tools (Claude Code, Copilot, OpenCode)
  • 75% discount makes Pro model very affordable right now

Old model names will be deprecated on July 24, 2026. I recommend using deepseek-v4-flash and deepseek-v4-pro directly in your new projects.


10. Resources


Hero image: generated with fal.ai + FLUX.1 Dev

Back to Blog

Related Posts

View All Posts »
WhatsApp ile yazin