Major technical breakthroughs have been made in the reliability of code in the field of artificial intelligence. AI giant Anthropic has officially released the fine-tuned upgrade version of its flagship model, Claude Opus 4.8. This version focuses on stronger agent programming, multi-domain reasoning, and knowledge work capabilities. The new model not only surpasses GPT-5.5 in multiple core benchmark tests, but also makes significant progress in addressing the industry's persistent problem of "AI lying with open eyes."

image.png

Sharp Reduction in Programming Flaws and More Sensitive Judgment

According to feedback from early testers, the upgraded Opus 4.8 performs more stably when handling complex multi-step tasks. Official evaluation data shows that the probability of the new model allowing defects in its own code without explanation has dropped by three-quarters. It now tends to proactively indicate its own uncertainty, not only identifying errors actively but also raising objections when it finds the user's initial plan unreasonable.

Speed Boost and Development Costs Reduced by 70%

While significantly improving logical rigor, Anthropic has also deeply optimized the model's operational efficiency. The running speed of Opus 4.8 in fast mode has jumped to 2.5 times the previous version, while the cost of using the model has been reduced to just one-third of the previous version. In the industry-recognized SWE-Bench Pro programming benchmark test, Opus 4.8 achieved a high score of 69.2%, successfully surpassing strong competitors such as Gemini 3.1 Pro in multiple core dimensions.