The Ma Yi Team Discovers: Fine-tuning Multimodal Large Models Leads to Catastrophic Forgetting


According to the latest SuperCLUE-VLM rankings, Google's Gemini-3-Pro scored 83.64 points, showing a significant lead, particularly in visual understanding and reasoning. Domestic models performed outstandingly, with SenseTime's SenseNova V6.5Pro and ByteDance's Doubao ranking second and third, showcasing China's rapid progress in the multimodal field. The evaluation covers three core capability dimensions.
OpenAI and Microsoft are being sued by news agencies for copyright infringement. Multimodal large models are becoming the mainstream trend in the field of large models. OpenAI is reportedly actively negotiating with publishers on AI policies while high-quality dataset copyright issues are receiving attention. CITIC Publishing collaborates with large model companies for language training, and Visual China holds a core advantage in the era of AIGC.
According to TMR Research, the global artificial intelligence chipset market size is expected to exceed $700 billion, with a compound annual growth rate of 31.8% from 2022 to 2031. The article discusses the development trends, application areas, and key players in the artificial intelligence chipset market, which is highly timely and valuable for readers interested in the artificial intelligence chipset market.
IBM's report provides sufficient evidence that artificial intelligence, automation, and threat intelligence can address data breaches throughout the lifecycle, reduce costs, and provide stronger evidence. The research found that integrating artificial intelligence and automation into security operations teams can reduce the lifecycle of data breaches by 33% and costs by 33.6%. However, currently, only 28% of enterprises widely apply artificial intelligence and automation. Many enterprises rely on legacy systems, which are easily bypassed by attackers. The significance of this article lies in emphasizing the effectiveness of artificial intelligence and automation in improving cybersecurity and calling on enterprises to widely adopt these technologies to protect data security.
The robotics research team at Google DeepMind recently released a robotics project called RT-2. This project took 7 months to develop and uses a large model for training. RT-2 has capabilities such as symbol understanding, reasoning, and human recognition, and can think and complete tasks based on human instructions. By combining the large model with the robot's operational capabilities, RT-2 can accomplish tasks that involve logical leaps, such as from 'extinct animals' to 'plastic dinosaurs'. The results of this project performed well in various sub - category tests, with performance up to three times that of the previous generation of robot models. This research result demonstrates the potential of large models in robotics research and is expected to drive the development of robots in the future.