Article

Say Goodbye to Long Text Anxiety: RedKnot Reasoning Engine from Xiaohongshu Open Sourced, Long Context Processing Efficiency Doubles

Published in Latest AI News

Time :Jun 30, 2026

Read :4minute

In the application scenarios of generative AI, how to make models process long texts quickly and efficiently has always been a challenge for engineers. Recently, the technical team at Xiaohongshu open-sourced its self-developed RedKnot inference engine, offering a new "cost-effective and efficient" solution for long-context tasks.

The core innovation of RedKnot lies in breaking away from the traditional KV Cache (key-value cache) processing model. Previously, large models stored caches in token (token) dimensions, which led to linearly increasing memory consumption when handling long texts, significantly slowing down the inference speed and concurrency capability. RedKnot takes an alternative approach by splitting the KV Cache along the attention head dimension, and introduces three mechanisms: "head-agnostic sparsity," "sparse FFN," and "SegPagedAttention," achieving consistency between algorithm logic and storage granularity.

This architectural adjustment brings significant performance improvements. Test data shows that in a high-performance computing environment with 8 H800 cards, RedKnot can accelerate the time to first token (TTFT) by 1.6 to 3.54 times, and single-card concurrency capability increases by 4.7 to 7.8 times. During the prefilling phase, computational resource consumption (FLOPs) is reduced by 67% to 79.5%. Taking the performance of the DeepSeek-V4-Flash model on a 128K long-context task as an example, the time to first token improves by 5.16 times, and the efficiency of KV data transmission is optimized by 6.3 times, while maintaining stable inference accuracy above 95% of dense model performance.

Industry experts believe that the open-sourcing of RedKnot provides important references for engineering optimization of inference engines. In the context of increasingly scarce computing resources, this approach of fine-grained decomposition at the underlying architecture to alleviate the burden of long-text inference undoubtedly opens up a new technical path for building lighter and more efficient AI inference systems. Currently, the relevant code has been officially open-sourced, aiming to promote the popularization and implementation of long-text AI applications.

Related Recommendations

Snap Splits Its Generative AI Video Team to Establish New Company Dotmo, Alleviating High R&D Costs

Snap spun off its AI video team into an independent company, Dotmo, to cut high internal generative AI costs and boost operational agility. Focused on AI models for interactive gaming experiences, Dotmo's core team comes from Snap. Though independent, it maintains close capital and technical ties with Snap.....

Jun 22, 2026

140.5k

Visual Feast Integrates AI: Getty Images and OpenAI Reach Strategic Licensing Agreement

Getty Images partners with OpenAI, integrating its vast licensed image library into ChatGPT for visual search support. The global visual media giant previously worked with Nvidia on generative AI; this deal advances its AI strategy.....

Jun 22, 2026

155.5k

OpenAI's Audited Expenses Reached $34 Billion Last Year, Proceeding with IPO and Valuation Expected to Exceed $1 Trillion

OpenAI's total expenditure over the past year reached $34 billion, with R&D spending at approximately $19 billion and operational costs like sales at nearly $6 billion. Despite high costs, capital continues to support operations and tech iteration. The company is advancing toward an IPO, with market expectations of a valuation exceeding $1 trillion post-listing.....

Jun 16, 2026

249.2k

Is AI the Killer of the Workplace? Employees from Big Companies Share Their Thoughts: Work Hasn't Decreased, but Tasks Have Become Even More Saturated

Generative AI significantly boosts efficiency in tech workplaces, reducing days of work to minutes. However, a recent survey reveals it hasn't shortened working hours but intensified busyness. An Amazon engineer notes that time saved on documentation is quickly filled with new tasks like data cleaning, increasing work pace pressure.....

Jun 10, 2026

230.4k

A Man Without Music Theory Background Uses AI to Compose Songs and Earns Tens of Thousands of Yuan Per Month, with Full-Genre Songs Generated in 40 Seconds

A man in Hangzhou, Zhejiang, with no musical background, earns over 100,000 yuan monthly using AI tools. By providing ideas or humming, AI completes lyrics, composition, arrangement, and vocals, drastically lowering music production barriers and showcasing generative AI's disruptive efficiency in music.....

Jun 5, 2026

319.5k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご