AI2 Releases Fully Open-Source Web Agent MolmoWeb: Control Web Pages with Just Visuals

Allan Artificial Intelligence Institute (AI2) recently released a breakthrough fully open-source web agent MolmoWeb. Unlike traditional agents that rely on the underlying web code (DOM), MolmoWeb makes decisions by reading screenshots, marking a major leap in "visually driven" web navigation technology.

Core Technology: Seeing Web Pages Like Humans

The operation logic of MolmoWeb is very intuitive: it captures a screenshot of the current browser window, uses visual analysis to determine the next action (such as clicking, scrolling, or paginating), then executes and repeats. This "what you see is what you get" model makes it more robust than traditional agents because the visual layout of a webpage is usually more stable than its underlying code, and its decision-making process is completely transparent and explainable to human users.

Performance Leap: Small Models Beat Big Ones

Although the parameter scale of MolmoWeb is only 4B and 8B, it has shown the strength of "small but powerful" in performance:

Leading the Rankings: In the WebVoyager test, the 8B version achieved a score of 78.2%, not only ranking first among open-source models, but also approaching OpenAI's proprietary model o3 (79.3%).
Great Potential: Research found that by running tasks multiple times and selecting the best results, its success rate can be further increased to 94.7%.
Precise Positioning: In UI element positioning benchmark tests, it even surpassed Anthropic's Claude3.7.

Data Support: The Largest Open Dataset in History

Not only did AI2 open-source the model weights this time, but it also contributed a large dataset named MolmoWebMix. This dataset includes:

36,000 real browsing tasks completed by human volunteers.
More than 2.2 million screen shot-questions pairs.
Automated synthetic data verified by GPT-4o. Experiments have shown that synthetic data can even outperform human trajectories in guiding the agent to find the "optimal path."

Open Source Spirit and Future Challenges

Currently, MolmoWeb is fully open-sourced on Hugging Face and GitHub under the Apache 2.0 license. Although it still faces challenges in handling complex instructions, login verification, and legal compliance (such as service terms), AI2 firmly believes that only through complete transparency and community collaboration can we truly counter the data monopoly of big tech companies.

Meta Releases New Flagship Model Muse Spark 1.1 with Enhanced Multi-Agent Automation Features

Meta launched its flagship large model Muse Spark 1.1, focusing on multi-agent automation workflows. It is now available for public beta through AI chat services and API. The model consists of a master agent responsible for planning and sub-agents that execute tasks according to instructions. At the start of the project, the master agent automatically generates an execution plan.

Meituan's Large Model Ecosystem Adjustment: Fully Restricting Doubao and Promoting Its Self-Developed LongCat System

Meituan has internally restricted the use of Doubao, a large model under ByteDance, requiring business teams to self-check their usage. If retention is needed, they must submit a necessity explanation and migration plan. This move shows that Meituan is accelerating its shift toward self-developed AI solutions and adjusting its infrastructure choices.

AI2 Releases Fully Open-Source Web Agent MolmoWeb: Control Web Pages with Just Visuals

Core Technology: Seeing Web Pages Like Humans

Performance Leap: Small Models Beat Big Ones

Data Support: The Largest Open Dataset in History

Open Source Spirit and Future Challenges

Related Recommendations

Grok Launches Automations Feature: Runs on Schedule, Triggers When an Email Arrives, and Even Replies on Your Behalf

Meta Makes a Big Move: Investing $50 Billion to Build a 5GW Data Center, Reshaping the AI Computing Landscape

Meta Releases New Flagship Model Muse Spark 1.1 with Enhanced Multi-Agent Automation Features

Meituan's Large Model Ecosystem Adjustment: Fully Restricting Doubao and Promoting Its Self-Developed LongCat System

Moving Away from One-Size-Fits-All: Cloudflare Introduces Fine-Grained AI Traffic Management to Build a Defensive Barrier for Website Monetization