Anthropic officially released its latest flagship large model, Claude Opus 4.8, on May 29th. As a precise upgrade targeting core user pain points, the new model has comprehensively enhanced agent programming, complex logical reasoning, and multi-domain knowledge work capabilities while maintaining the original pricing system.

In core AI programming and agent performance, the new version has made significant breakthroughs. Feedback from early testing institutions indicates that Opus 4.8 performs more stably in daily use and makes more accurate judgments. When handling complex multi-step tasks, it not only demonstrates high reliability but also proactively raises objections and highlights uncertainties when plans are unreasonable. Evaluation data shows that the probability of the model allowing defects in its written code without explanation has dropped to one-quarter of the previous generation, significantly reducing the occurrence of unsupported conclusions.

In industry-anticipated benchmark tests, the new model has shown strong dominance. Official data shows that Opus 4.8 achieved an excellent score of 69.2% on the well-known programming benchmark test SWE-Bench Pro, and successfully surpassed GPT-5.5 and Gemini 3.1 Pro in multiple core mainstream benchmark tests, further solidifying its position as a top-tier model in the industry.

Beyond the leap in capabilities, this update has also brought great surprises in user experience and computational cost. The new Claude platform has added a "effort level" control feature, allowing users to freely switch between pursuing excellence in quality and achieving ultra-fast response speeds. More disruptive is that the new model runs 2.5 times faster in fast mode, while the actual model cost has been significantly reduced, being only one-third of the previous model. This strategy of increasing output while reducing costs will undoubtedly provide developers with stronger productivity support.
