In the wave of technology, the Douyin SAIL team has partnered with LV-NUS Lab to launch a multimodal large model called SAIL-VL2. This new model surpasses many similar models in complex reasoning tasks while maintaining a small parameter scale, even competing with larger closed-source models. This breakthrough undoubtedly redefines the possibilities of small models.
SAIL-VL2 comes with parameter settings of 2B and 8B, achieving performance breakthroughs on 106 datasets, especially excelling in complex reasoning benchmarks such as MMMU and MathVista. The model demonstrates a new paradigm: "small models can also have strong capabilities." To ensure this, SAIL-VL2 introduces three major innovations in data, training, and architecture design.

In terms of architecture design, SAIL-VL2 introduces a sparse mixture of experts (MoE) to optimize performance and computational efficiency. Its visual encoder, SAIL-ViT, adopts a progressive optimization approach, gradually improving the alignment between vision and language. This innovative design allows SAIL-VL2 to activate only part of its parameters during inference, significantly enhancing the model's computational efficiency.
On the data level, SAIL-VL2 has built a high-quality multimodal corpus, ensuring the accuracy and diversity of data through scoring filtering and synthetic enhancement. Meanwhile, the team designed a progressive training framework that transitions from basic perception to complex reasoning, making the model's performance better across different tasks.
Through full-chain optimization, SAIL-VL2 has made significant progress in the performance of the base model. Data shows that the model stands out in multiple benchmark tests, with its 8B-scale model already matching the latest GPT-4o in reasoning capabilities. Such progress not only brings new hope to the research community but also opens up new paths for the application of future multimodal models.
The open-source code and models of SAIL-VL2 have been released on GitHub and Hugging Face, making it convenient for researchers and developers to use and explore further. Whether in academic research or industrial applications, SAIL-VL2 demonstrates strong potential and promising prospects.
