Baidu officially released the PaddleOCR-VL-1.6, a derivative model of the ERNIE Large Model. In the authoritative OmnicDocBench v1.6 evaluation, it achieved an accuracy of 96.33%, surpassing mainstream large models such as Gemini-3-Pro, GPT-5.2, and GLM-OCR, setting a new industry SOTA and ranking first in comprehensive performance globally. This release marks a significant breakthrough in multi-modal large models' ability to understand complex documents and analyze real-world scenarios.

As a core component of the ERNIE Large Model's multi-modal capabilities, PaddleOCR is trained on the ERNIE Large Model and currently supports over 100 languages, with users covering more than 170 countries and regions worldwide. The upgraded PaddleOCR-VL-1.6 maintains a lightweight architecture of 0.9B while significantly improving core recognition capabilities in complex scenarios such as tables, ancient texts, rare characters, seals, and charts through a data construction mechanism driven by the model and progressive training optimization.

In the Real5-OmniDocBench evaluation for real-world complex scenarios, the model also maintained its leading position with a total score of 93.19%, overcoming industry-recognized parsing challenges such as scanned documents, bent pages, screen photos, changes in lighting, and tilted documents.

Due to the continued use of the previous architecture, enterprises and developers can achieve smooth migration without additional adaptation. Currently, PaddleOCR has surpassed 79.2K stars on GitHub, exceeding Google's Tesseract OCR and becoming the most popular open-source OCR project globally. The new model is now available on the official website, with its code and weights open-sourced. In the current trend of large models evolving towards multi-modal depth, PaddleOCR-VL-1.6 not only provides a more efficient industrial-level solution for document digitization but will also further accelerate the deployment of AI in complex multi-modal scenarios.