Cambricon announced that they have successfully completed the Day 0 compatibility for the latest open-source AI model DeepSeek-V4 from DeepSeek. This move means that the model can run stably on the day of its release, providing users with a more efficient artificial intelligence experience. Cambricon used its self-developed high-performance integrated operator library Torch-MLU-Ops to specifically accelerate modules such as Compressor and mHC in the model. The introduction of this technology significantly improved the inference efficiency.

Regarding the inference framework, Cambricon adopted the vLLM (Variable Length Language Model) technology, fully supporting various parallel computing methods, including TP, PP, SP, DP, and EP. At the same time, Cambricon also implemented communication-computation parallelism, low-precision quantization, and PD separation deployment optimization. These measures significantly improved processing speed while meeting latency constraints.

Additionally, Cambricon deeply explored hardware characteristics by optimizing MLU memory access and sorting, accelerating the operation of structures such as sparse Attention and Indexer. The high interconnect bandwidth and low communication latency characteristics minimized the communication ratio in different workload scenarios, effectively improving the utilization of distributed inference.

Notably, the DeepSeek-V4 model has an ultra-long context of millions of characters, achieving a leading level in the domestic and international open-source field in terms of Agent capabilities, world knowledge, and reasoning performance. Users can interact with the latest DeepSeek-V4 by visiting the official website or the official App, enjoying the new experience brought by the ultra-long context memory. Meanwhile, the API service has been updated, allowing developers to easily call the new model.

This series of optimizations and compatibility work not only enhanced the model's performance but also provided a solid foundation for subsequent AI technology applications, demonstrating Cambricon's strong strength in the field of artificial intelligence.

Key Points:   

🌟 Cambricon completed the Day 0 compatibility for DeepSeek-V4, enabling the model to run stably on the day of its release.   

🚀 Self-developed high-performance operator library and inference framework optimization significantly improve inference efficiency.   

📈 DeepSeek-V4 supports an ultra-long context of millions of characters, offering a leading AI experience.