Recently, the AI lab Andon Labs conducted a notable study that specifically evaluated the performance of robotic vacuum cleaners equipped with top-tier large models in completing simple household tasks. The experiment tasked these robots with executing a series of complex instructions, such as "hand the butter to the person," which involved cross-room positioning, identifying packaging, locating moving humans, completing deliveries, and returning to charge—multiple-step processes.
However, the results were shocking. The success rates of these advanced robots in performing the tasks were far lower than those of humans. The data showed that the success rate of Gemini 2.5 Pro was only 40%, Claude Opus 4.1 was 37%, and GPT-5 was as low as 30%. These figures indicate that although they possess strong text generation capabilities, they still struggle significantly in areas such as spatial reasoning, environmental understanding, and long-term task planning.

Image source note: The image is AI-generated, and the image licensing service provider is Midjourney.
The research team pointed out that this low success rate is not only due to technical shortcomings but also involves potential safety hazards. For example, some robots might leak confidential documents during operation or fail to correctly identify the risk of stairs, leading to accidental falls. This phenomenon further reveals the security vulnerabilities faced by current large language models (LLMs) when combined with machines.
Amid the growing interest from tech giants in the robotics industry, this study reminds us that strong text generation capabilities do not guarantee that robots can perform tasks stably and safely in the real world. To bring AI robots into family life, we still have many engineering and safety issues to resolve.
Although these smart devices carry great expectations in home life, based on the current research results, we need to be more cautious in their application. As technology continues to advance, we hope that future robotic vacuum cleaners will overcome these challenges and bring real convenience to our daily lives.
