Galileo Launches New Tool to Explain AI Large Model Hallucination Phenomenon


Google's search has faced a quality trust crisis due to frequent factual errors and contradictions in the AI overview feature. To address this issue, Google is urgently hiring AI answer quality engineers to specifically optimize the accuracy and reliability of generated answers.
According to the SuperCLUE-VLM multimodal evaluation, Google's Gemini-3-pro scored 83.64 points to win first place, leading comprehensively in three key dimensions: basic cognition, visual reasoning, and application. The performance of domestic models has also attracted attention.
Six departments in Beijing jointly issued measures to promote the upgrading of the medical device industry through data circulation and AI large models. Key initiatives include building high-quality medical data sets and improving data circulation policies to promote safe and compliant data applications, meeting the needs of enterprises and research institutions.
Tsinghua University published a study in "Nature Machine Intelligence", introducing the new concept of "ability density", challenging traditional AI evaluation standards. The research emphasizes that attention should not only be paid to the number of model parameters, but also to the level of intelligence within each parameter, questioning the scale rule that larger models are necessarily more capable.
The latest AI programming model rankings from LMArena show that Claude from Anthropic, GPT-5 from OpenAI, and Zhipu GLM-4.6 are tied for first place globally. These models, designed specifically for programming, can significantly improve the efficiency of code writing, debugging, and optimization, driving advancements in software development.