Allan Artificial Intelligence Institute (AI2) recently released a breakthrough fully open-source web agent MolmoWeb. Unlike traditional agents that rely on the underlying web code (DOM), MolmoWeb makes decisions by reading screenshots, marking a major leap in "visually driven" web navigation technology.

Core Technology: Seeing Web Pages Like Humans

The operation logic of MolmoWeb is very intuitive: it captures a screenshot of the current browser window, uses visual analysis to determine the next action (such as clicking, scrolling, or paginating), then executes and repeats. This "what you see is what you get" model makes it more robust than traditional agents because the visual layout of a webpage is usually more stable than its underlying code, and its decision-making process is completely transparent and explainable to human users.

QQ20260326-092046.jpg

Performance Leap: Small Models Beat Big Ones

Although the parameter scale of MolmoWeb is only 4B and 8B, it has shown the strength of "small but powerful" in performance:

  • Leading the Rankings: In the WebVoyager test, the 8B version achieved a score of 78.2%, not only ranking first among open-source models, but also approaching OpenAI's proprietary model o3 (79.3%).

  • Great Potential: Research found that by running tasks multiple times and selecting the best results, its success rate can be further increased to 94.7%.

  • Precise Positioning: In UI element positioning benchmark tests, it even surpassed Anthropic's Claude3.7.

Data Support: The Largest Open Dataset in History

Not only did AI2 open-source the model weights this time, but it also contributed a large dataset named MolmoWebMix. This dataset includes:

  • 36,000 real browsing tasks completed by human volunteers.

  • More than 2.2 million screen shot-questions pairs.

  • Automated synthetic data verified by GPT-4o. Experiments have shown that synthetic data can even outperform human trajectories in guiding the agent to find the "optimal path."

QQ20260326-092350.jpg

Open Source Spirit and Future Challenges

Currently, MolmoWeb is fully open-sourced on Hugging Face and GitHub under the Apache 2.0 license. Although it still faces challenges in handling complex instructions, login verification, and legal compliance (such as service terms), AI2 firmly believes that only through complete transparency and community collaboration can we truly counter the data monopoly of big tech companies.