Article Content

Research Alert: AI Agents' Testing is Too Focused on Programming, Overlooking 92% of the Real Labor Market

Published in Latest AI News

Time :Mar 9, 2026

Read :4minute

A recent joint study by Carnegie Mellon University and Stanford University points out that the current development of artificial intelligence agents (AI Agents) is facing serious "path dependence." The research shows that existing AI evaluation benchmarks are highly concentrated on programming tasks, while neglecting the non-programming fields that make up 92% of the U.S. labor market.

Researchers systematically analyzed 72,000 tasks from 43 mainstream AI benchmarks and compared them with 1,016 real jobs in the U.S. government O*NET occupational database.

The imbalanced situation found in the survey:

"Benchmark blind spots" in the digital industry: Despite the high level of digitization in managerial jobs, which reaches 88%, they account for only 1.4% in existing AI benchmark tests; legal jobs have a digitization level of 70%, but their share in the benchmark tests is as low as 0.3%.
Serious skills mismatch: Current AI evaluations mainly focus on "information retrieval" and "computer operation" skills, which cover less than 5% of U.S. jobs. Meanwhile, the "interpersonal interaction" category, which is crucial in real work, is almost completely ignored in existing AI tests.
"Ability drop" caused by increasing complexity: The study found that AI agents perform very poorly when dealing with complex tasks. Even in their most skilled area, software development, the success rate of AI drops sharply when the number of steps increases or the logic becomes more complex.

Researchers call for future AI benchmarks to focus more on high-value, highly digitized fields such as management, law, construction, and engineering. At the same time, evaluations should not only focus on the final results but also pay attention to the intermediate steps during the execution process, to address practical challenges such as vague goals and long verification cycles.

This conclusion is also supported by market data. A recent analysis by Anthropic showed that nearly 50% of its API calls are still concentrated on software development. Experts warn that if AI development continues to blindly pursue programming tasks that are easy to automatically score, it may miss the best opportunity for AI to demonstrate productivity value across a broader economic field.

Related Recommendations

Tencent Launches the All-Scenario AI Agent WorkBuddy: Compatible with OpenClaw and Supports Multi-Model Switching

Tencent launched the all-scenario AI agent WorkBuddy in March 2026, aiming to lower the entry barrier for large model applications. The product is compatible with the open-source project "OpenClaw", featuring no-deployment and ready-to-use characteristics, promoting the evolution of AI agents from geek tools to general-purpose office tools. Its technical core lies in simplifying cloud configuration, allowing users to drive it through instructions after downloading, significantly improving office efficiency.

Mar 9, 2026

188.3k

Volc Engine Launches ArkClaw: The Cloud SaaS Version of OpenClaw is Officially Launched, Integrated with Doubao Large Model

Volcano Engine launches ArkClaw, a cloud-based SaaS version of OpenClaw, offering an out-of-the-box AI assistant service. It addresses issues like complex environment setup, high token consumption, and unstable session states, enabling 24/7 operation and advancing AI Agents from development to zero-barrier commercial applications.....

Mar 9, 2026

162.1k

Causing a Stir in Half of China's Major AI Model Community! The Founder of the Popular AI Agent OpenClaw Makes an Emergency Statement, Saying He Has Never Been on Chinese Social Media Platforms Such as Weibo and Bilibili

OpenClaw founder denies rumors of official Chinese social media accounts, emphasizing the project's focus on local AI operation and data control for enhanced security.....

Mar 9, 2026

166.7k

Xiaomi AI Agent Lobster Makes a Quiet Appearance: Commitment Not to Use User Data for Training

Xiaomi launches AI interactive product 'Xiaomi miclaw', built on MiMo model for mobile smart agents. It redefines human-computer interaction via system-level integration, personal understanding, ecosystem connectivity, and self-evolution. Currently in limited beta for invited tech enthusiasts, prioritizing Xiaomi 17 series devices.....

Mar 6, 2026

220.2k

GPT-5.4 Shocking Release: The Debut of Native Computer Hacking Technology! OSWorld Surpasses Humans, Teamwork with OpenClaw Creates the Strongest Personal AI Employee in 2026

OpenAI's GPT-5.4, launched in March 2026, enables direct AI control of computers via screenshot recognition and simulated input, excelling in desktop navigation on OSWorld-Verified benchmarks and reshaping the AI agent landscape.....

Mar 6, 2026

223.8k

Intelligent Future, Your Artificial Intelligence Solution Think Tank

English 简体中文繁體中文にほんご