Article Content

Moonshot Launches Kimi Linear Model: Processing Long Contexts 2.9 Times Faster

Published in Latest AI News

Time :Nov 4, 2025

Read :4minute

In the field of Artificial Intelligence Generated Content (AIGC), the Kimi Linear model developed by the Moonshot team has achieved significant technological advances. This innovative model has improved the speed of processing long contexts by 2.9 times and decoding speed by 6 times, breaking through the performance bottlenecks of traditional full-attention mechanisms. Kimi Linear adopts a hybrid linear attention architecture, outperforming the commonly used Softmax attention mechanism in multiple scenarios such as context processing and reinforcement learning.

Traditional Transformer models use the Softmax attention mechanism, which has a computational complexity of up to O(n²). This makes the computational load and memory consumption grow exponentially when processing long texts, seriously affecting the practical application of the model. The introduction of linear attention reduces this complexity to O(n), significantly improving processing efficiency. However, early linear attention had unsatisfactory performance, especially in terms of limitations in managing long sequences.

The core innovation of the Kimi Linear model is Kimi Delta Attention (KDA), which improves the model's memory management capabilities by introducing a fine-grained gating mechanism. KDA can dynamically adjust the memory state based on the input, effectively controlling information forgetting and retention, thereby better handling information in long-term interactions.

Kimi Linear also adopts the Moonlight architecture, which mixes KDA with full-attention layers in a 3:1 ratio to achieve a balance between efficiency and model capability. This design enables Kimi Linear to demonstrate excellent performance in long-context processing while effectively reducing computational costs.

Through a series of experiments, Kimi Linear has shown excellent performance on multiple tasks, especially in palindrome and multi-query correlation recall tasks that require long-context memory, with accuracy far exceeding previous models, demonstrating the advantages of fine-grained control.

Key Points:
🌟 The Kimi Linear model has improved the speed of long-context processing by 2.9 times and decoding speed by 6 times.
🔍 It uses the innovative Kimi Delta Attention (KDA) mechanism to optimize memory management and information forgetting.
📈 Through a 3:1 hybrid architecture design, it balances computational efficiency and model performance, showing outstanding capabilities in experimental results.

Related Recommendations

Moonshot Introduces a New Hybrid Linear Attention Architecture Kimi Linear

Kimi Linear, a hybrid linear attention architecture by Moon AI, outperforms traditional methods in long/short-range processing and reinforcement learning. It uses Kimi Delta Attention with gating to enhance RNN memory efficiency, combining three KDA and one MLA.....

Oct 31, 2025

182.6k

Moonshot Launches Kimi Linear Architecture: KV Cache Reduced by 75%, Inference Speed Increased by 6 Times, Attention Mechanism Sees Groundbreaking Innovation!

Kimi Linear's hybrid linear attention architecture surpasses traditional methods in short/long-range processing and reinforcement learning, featuring Kimi Delta Attention for enhanced RNN memory efficiency and multi-scenario performance.....

Oct 31, 2025

152.6k

Moonshot AI Launches Kimi Linear: 6 Times Faster Linear Attention Architecture, Open-Source KDA Kernel Released Simultaneously

The domestic team Moonshot AI released the technical report on the Kimi Linear architecture, proposing a hybrid linear architecture that can replace the full attention mechanism. This architecture achieves breakthroughs in speed, memory efficiency, and long context processing, significantly reducing the use of KV cache, combining efficiency with performance advantages, and is called the new starting point for attention mechanisms in the era of intelligent agents.

Oct 31, 2025

145.2k

Former ByteDance JinYing AI Product Leader Liao Qian Starts His Own Venture, Launches Marketing Multimodal Agent

Liao Qian, former head of the AI product at ByteDance's JinYing, has founded a company called "Extreme Context," focusing on developing marketing multimodal agents. With extensive experience in AIGC, he quickly secured millions in seed funding. Liao Qian previously worked at Tencent and ByteDance, and has been involved in AIGC technology since 2019, drawing attention from the industry.

Oct 29, 2025

142.3k

Visual China Collaborates with Multiple AI Companies to Develop Commercially Available Visual Large Models: Has Received Orders from Alibaba, Microsoft, and Others

Visual China disclosed the progress of its AI business during an online meeting, stating that it has collaborated with multiple AIGC companies to develop a "commercially available and traceable" visual creative large model, and has received compliant data service orders from Alibaba, Microsoft, and others. The company positions itself as providing high-quality, copyright-compliant data resources for AI model training, and possesses over 700 million content data entries for visual training.

Oct 20, 2025

149.9k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご