Amazing Endurance! Anthropic's Flagship Model Claude Opus4.5 Sets a New Record for Long Task Processing

While pursuing the "high IQ" of large models, AI's ability to perform continuously has become a new dimension for measuring its level of evolution. According to the latest benchmark test released by the artificial intelligence research institution METR, the top model from Anthropic, Claude Opus4.5, has demonstrated dominant strength in handling long-duration tasks.

The test results show that Claude Opus4.5 can handle complex tasks for about 4 hours and 49 minutes while maintaining a 50% success rate, setting a new industry record. The so-called "time resolution" metric reveals the endurance limits of the model under different difficulty challenges: when facing simple tasks (80% success rate), it only takes 27 minutes to complete them; however, once entering the deep waters of high-difficulty and time-consuming tasks, the advantages of Opus4.5 are greatly amplified.

AIbase noted that although the test data showed a figure suggesting the model could theoretically work continuously for more than 20 hours, METR admitted this might be due to errors caused by a small sample size. Nevertheless, this breakthrough still marks that AI is transitioning from "short instruction responders" to "long-term project executors."

However, some experts have raised doubts about the limitations of this test. Currently, METR only covers 14 samples, and some believe such benchmark tests may be "scored" by the model specifically. But there is no doubt that the emergence of Claude Opus4.5 indeed provides new possibilities for AGI tasks requiring high-intensity and long-term logical support.

xAI releases 45-minute full meeting video: Musk restructures four teams, aims to build a lunar AI factory

Elon Musk's xAI releases a rare 45-minute all-hands meeting video, seemingly countering a New York Times leak. It reveals xAI's deep integration with X, new organizational structure, and Musk's 'Lunar AI Base' vision. The meeting confirms recent staff changes as part of restructuring and establishes four core teams.....

Amazing Endurance! Anthropic's Flagship Model Claude Opus4.5 Sets a New Record for Long Task Processing

Related Recommendations

xAI releases 45-minute full meeting video: Musk restructures four teams, aims to build a lunar AI factory

Security Alignment Formation! OpenAI Dissolves Mission Alignment Team, Leader Appointed as Chief Futurist

Has the Robot Evolution Singularity Arrived? Alibaba Releases RynnBrain Large Model: Equipping Machines with Thinking Brains, Performance Exceeds Google Gemini

Backed by Sequoia! AI Lab Flapping Airplanes Secures $180 Million in Funding, Aiming to Make AI Learn Like the Human Brain

The Free Version of ChatGPT Will Now Show Ads! OpenAI Starts a Monetization Model, Competitor Anthropic Seizes the Opportunity