Memory Anxiety Terminator: Google Launches TurboQuant to Shrink Large Models by Six Times

In the reasoning process of large language models (LLMs), memory bottlenecks have always been the "number one killer" restricting performance. Every time AI processes long texts or generates complex answers, a "working memory" called KV cache (Key-Value Cache) rapidly expands, causing the system to slow down or even crash. To address this challenge, Google Research officially launched a new AI memory compression technology called TurboQuant on March 26, 2026.

The core breakthrough of this technology is that it can reduce the cache memory usage to one sixth of the original without sacrificing model accuracy, while achieving an impressive eightfold increase in inference speed.

Overcoming the KV Cache Bottleneck: Let AI Remember More and Run Faster

The emergence of TurboQuant marks a new dimension in AI operational efficiency. It adopts an advanced vector quantization scheme, mainly consisting of the PolarQuant quantization method and QJL optimization approach. In rigorous tests on mainstream open-source large models such as Gemma and Mistral, TurboQuant demonstrated strong adaptability: it can efficiently compress key-value caches to 3 bits without any pre-training or fine-tuning. In the "needle in a haystack" long context test simulating real and complex scenarios, the technology achieved zero precision loss, meaning that after significantly reducing its size, AI can still maintain its original intelligence and memory accuracy.

Hardware Efficiency Peak: An 8-Fold Jump on H100 Accelerators

Aside from reducing memory usage, TurboQuant also impresses in hardware utilization. On high-performance H100 GPU accelerators, the TurboQuant optimized to 4 bits runs 8 times faster than the unquantized 32-bit baseline.

Musk Manually Clears Grok Build User Data, Agentic Coding Trust Crisis Resolved

The AI coding tool Grok Build was exposed for uploading users' code repository data without authorization, causing strong concerns among developers. Musk responded by stating that all historical user data will be completely deleted. The incident originated from a security researcher's phishing test, during which a fake repository was monitored throughout to capture its misconduct.

PixVerse Completes $439 Million C-Round Expansion Financing, Valuation Surges to $2 Billion

Competition in the AI video generation sector is intensifying. Singapore-based startup PixVerse has completed a $439 million C-round expansion financing, with its valuation exceeding $2 billion, signaling strong growth confidence. The company was co-founded in 2023 by Wang Changhu, former head of computer vision at ByteDance, and Jaden Xie, following a typical technology-driven development path.

New Species of Mobile Phones Emerge: StepXun Announces Native Intelligent Agent System and the First Phone STEPX Neo

StepXun held the "StepXun Terminal Brand and Next Generation Intelligent Agent Strategy Launch Event" on the evening of July 13, announcing three innovative achievements: the native intelligent agent system Step AOS, the large model-native AI terminal brand STEPX, and the first phone STEPX Neo. The core is Step AOS, designed to restructure hardware scheduling and system capability allocation for intelligent agents, building a new generation of intelligent agent interaction.

SenseNova-Vision by SenseTime Open-Sources a Unified Vision Large Model, Achieving Outstanding Performance in Four Core Visual Tasks with a Single Model

SenseTime has released and open-sourced the "SenseNova-Vision" vision large model, whose core lies in natively integrating vision capabilities into a general foundation model, breaking the fragmented mode of packaging expert models such as detection and segmentation. This model achieves dominant performance across multiple evaluation tasks in four fields, marking a critical upgrade in visual tasks toward a unified native architecture.

OpenAI Criticizes AI Evaluation Benchmark: 731 Questions, Nearly a Third Have Flaws. 8-Month Passing Rate Rises from 23% to 80%, Now Ineffective

OpenAI publicly questioned the SWE-Bench Pro benchmark, pointing out that about 30% of its 731 test tasks have evaluation flaws. The benchmark, launched by Scale AI, is an industry authority for measuring large model programming capabilities. However, OpenAI warned that the passing rate of cutting-edge models has surged from 23.3% to 80.3% within 8 months, which is unusually fast, indicating doubts about the reliability of the evaluation.