Zhejiang University Alumni Collaborate with Microsoft to Launch Multimodal Model LLaVA, Challenging GPT-4V


MiniMax (Xiyu Technology) has launched the '10x Team' global talent collaboration program, aiming to gather top experts from various industries, combine industry expertise with cutting-edge AI technology, and promote the application of large models in vertical fields, extending productivity from general to specialized scenarios, achieving a tenfold increase in industry efficiency. It also opens up multimodal core resources to verify the value of industry insights.
NVIDIA announced a significant expansion of its open-source model family at the 2026 GTC conference, with the key release of the Nemotron 3 series multimodal models. Among them, Nemotron 3 Ultra is optimized based on the Blackwell architecture, achieving a fivefold improvement in throughput efficiency, specifically designed for complex code assistance and enterprise workflows. Meanwhile, the company also showcased its latest achievements in multimodal interaction, aiming to accelerate innovation in intelligent agents, physical AI, and healthcare fields.
Apple launched the multimodal model Manzano, which solves the long-standing problem in the AI field of being unable to balance visual understanding and image generation through an innovative dual-structure architecture.
Moonshot plans to launch the multimodal model K2.1/K2.5 in the first quarter of 2026. The model is an upgraded version of its trillion-parameter open-source model Kimi K2, aiming to enhance multimodal processing and agent capabilities. Since its release in July 2025, Kimi K2 has performed exceptionally well in areas such as code generation, thanks to its mixture-of-experts architecture.
The Zhipu team has open-sourced four core video generation technologies, including GLM-4.6V visual understanding, AutoGLM device control, GLM-ASR speech recognition, and GLM-TTS speech synthesis models, showcasing their latest progress in the multimodal field and laying the foundation for the development of video generation technology.