Another major update has been made in the field of large models. The leading domestic AI company DeepSeek officially released its new flagship model DeepSeek V4 today. The biggest highlight of this release lies in its differentiated strategy, precisely covering different needs from high-frequency lightweight applications to complex reasoning tasks through two versions: Flash and Pro, and once again redefining the benchmark for AI commercial pricing with highly competitive prices.

Model Matrix: Differentiated Positioning of Flash and Pro

DeepSeek V4 has integrated and upgraded the original deepseek-chat and deepseek-reasoner models, officially dividing them into two versions:

  • DeepSeek-V4-Flash: Focuses on extreme cost-effectiveness and high throughput efficiency, suitable for fast-response general conversations and basic text tasks.

  • DeepSeek-V4-Pro: Optimized for complex logic, deep thinking tasks, and high-performance computing scenarios, offering stronger reasoning and processing capabilities.

Both models support the thinking mode (except specific scenarios), JSON output, Tool Calls, and dialogue prefix continuation (Beta), and support a context window of up to 1M and a maximum output length of 384K, providing a solid foundation for complex engineering implementation.

image.png

Pricing System: Extremely Transparent, Tiered Billing

This time, DeepSeek's pricing is clear, significantly reducing the marginal cost for enterprises to call long-term through a caching mechanism. Here are the specific billing standards for each model (unit: RMB per million Tokens):

Model NameInput (Cache Hit)Input (Cache Miss)Output
DeepSeek-V4-Flash0.2 RMB1 RMB2 RMB
DeepSeek-V4-Pro1 RMB12 RMB24 RMB

Note: Fees will be deducted first from the free balance, then from the paid balance.

Industry Analysis: Why Is This Price a Milestone?

From the pricing logic, it can be seen that DeepSeek is encouraging developers to reduce computing power waste by optimizing caching, thus achieving fine-grained control of business costs through the significant price difference between "cache hit" and "cache miss."

For developers, the price of Flash version at 1 RMB per million Tokens (cache miss) greatly lowers the access threshold to top MoE model architectures. At the same time, the pricing of the Pro version for complex reasoning tasks also provides a domestic solution for enterprises to build knowledge base Q&A systems, automated agents (Agents), and other advanced applications, balancing performance and cost.

Important Note on Compatibility

The official reminds: the original model names deepseek-chat and deepseek-reasoner will be formally deprecated in the future. To ensure a smooth transition, developers can now directly call deepseek-v4-flash (corresponding to non-thinking mode) and deepseek-v4-pro (corresponding to thinking mode).

For detailed interface information and migration guides, developers can visit DeepSeek API Official Documentation to view.