Retrieval Speed Increases by 948 Times! Google DeepMind Launches STATIC Framework to Tackle the Challenges of Generative Retrieval in LLMs

In modern industrial recommendation systems, "Generative Retrieval (GR)" based on large language models (LLMs) is gradually replacing traditional embedded search. However, this approach faces a tricky issue in practical applications: the model tends to "talk nonsense," generating non-existent product IDs or violating inventory logic.

To address this pain point, the research teams from Google DeepMind and YouTube recently jointly released a new framework called STATIC (Sparse Transition Matrix Accelerated Trie Index for Constrained Decoding). This technology significantly improves the constrained decoding speed of LLMs by an astonishing 948 times through innovative mathematical methods.

Key Technological Breakthroughs:

Turn "Tree" into "Matrix": Traditional constraint verification relies on prefix trees (Trie), but it runs inefficiently on hardware like GPUs/TPUs. STATIC flattens the complex tree structure into a static compressed sparse row (CSR) matrix, transforming the verification process into a vectorized operation that hardware excels at.
Exceptional Response Speed: In tests with a 3 billion parameter model, STATIC's single-step latency is as low as 0.033 milliseconds. Compared to traditional CPU retrieval solutions, it is nearly a thousand times faster; compared to existing hardware acceleration solutions, it leads by over 40 times.
YouTube's Successful Trial: This technology has been deployed in YouTube video recommendations, ensuring recommended content meets business constraints such as "freshness within the past 7 days." Testing shows that the playback volume of fresh videos increased by 5.1%, and click-through rate (CTR) also saw significant growth.

In addition, STATIC also addresses the shortcomings of generative retrieval during the "cold start" phase. Through precise decoding constraints, the model achieves a breakthrough in accuracy when recommending completely new products it has never seen before.

Betting on People Rather than Code: The Zig Project's Strict Policy Prohibiting LLM-Assisted Contributions Sparks Debate

As Generative AI sweeps through the programming field, the Zig open-source project has introduced a strict policy in the opposite direction: completely prohibiting the use of code or comments generated by large language models for contributions. After Simon Willison's interpretation, it sparked a discussion within the community about the trade-off between technical efficiency and talent development. The core conflict lies in the choice between code production and talent growth. The Zig maintainers redefined 'contributions,' emphasizing originality and the learning process.

Moonshot AI Collaborates with Tsinghua University to Launch PrfaaS Architecture, Breaking the Bottleneck of Large Model Computing Power

The efficiency of large language model inference has made a breakthrough. Tsinghua University and Moonshot AI jointly proposed a new architecture called "Prefill-as-a-Service," which splits the inference process into two stages: prefilling and decoding, and optimizes the allocation of computing resources, effectively solving hardware limitations and significantly improving model service performance.

Wikipedia Bans AI: Strictly Prohibits LLM-Generated Content, Violators May Be Penalized

Wikipedia has officially banned the use of large language models to generate or rewrite article content, ending its previous ambiguous stance on AI. The new policy received overwhelming support from volunteer editors, aiming to maintain the reliability of content and prevent inaccurate or plagiarized content generated by AI.

Retrieval Speed Increases by 948 Times! Google DeepMind Launches STATIC Framework to Tackle the Challenges of Generative Retrieval in LLMs

Related Recommendations

Betting on People Rather than Code: The Zig Project's Strict Policy Prohibiting LLM-Assisted Contributions Sparks Debate

Moonshot AI Collaborates with Tsinghua University to Launch PrfaaS Architecture, Breaking the Bottleneck of Large Model Computing Power

AI Medicine Enters the Deep Waters: Research Indicates Generative Models Still Struggle to Independently Bear the Burden of Clinical Reasoning

Apple Releases AI Prototype Tool SQUIRE Aimed at Redesigning the UI Design Process

Wikipedia Bans AI: Strictly Prohibits LLM-Generated Content, Violators May Be Penalized