Recently, there has been a significant breakthrough in the technology for real-world data collection of embodied intelligence. The AoE (Always-On Egocentric) continuous first-person video collection framework, developed by the Tianji Lab team at Ant Data, proposes a lightweight and low-cost solution for embodied data collection. With just a smartphone and a neck-mounted bracket costing less than $20, it can replace professional equipment that often costs tens of thousands of dollars, achieving high-quality data collection for embodied intelligence. The introduction of this technical solution effectively addresses the challenges of high costs and difficulty in scaling up embodied data collection. Currently, this technical paper has been published on Arxiv.

image.png

As foundational models continue to evolve, their generalization capabilities and cross-scenario adaptability increasingly rely on the scale, quality, and coverage of real-world interaction data. The core breakthrough of AoE lies in transforming "people + phone" into a sustainable data node. Its carrier is a human-centric neck-mounted bracket, which can securely position the phone in front of the chest using mechanical clamps and magnetic attraction, continuously capturing first-person footage close to the user's perspective, thus fully recording natural interaction processes.

The solution maintains millimeter-level trajectory accuracy and an accuracy rate of over 90% for hand key point recognition while enabling concurrent data collection from thousands of devices and automated cloud processing. Field tests show that when only 50 remote operation data points were used for the task of turning off the computer on the Unitree G1 robot, the success rate was 45%, but after introducing 200 AoE data points, the success rate jumped to 95%. When data was scarce, AoE played a critical role in "starting learning."

image.png

Low-cost data collection is just the beginning. According to the paper, Ant Data has overcome the technical challenge of "converting long videos into training data": This solution uses a lightweight visual model on the edge to automatically identify hand-object interactions and trigger recording. It then utilizes a large language-visual model to split continuous video into semantic-labeled atomic action segments. Finally, through automatic annotation, filtering, and cleaning on the cloud, the videos recorded by the phone are automatically converted into high-quality, standardized training data.

In addition, AOE has built a terminal-cloud collaborative solution, achieving automated processing for data collection, preprocessing, cleaning, selection, and scheduling. This reduces the need for manual intervention and improves overall throughput.

Reporters noticed that Ant Data is heavily investing in AI to B. Focusing on the application of AI in industries, its Tianji Lab focuses on areas such as AI+data, AI+security, AI+finance, and AI+embodied intelligence, accelerating the transformation and application of technological achievements. Since the start of 2026, Ant Data has been making frequent moves in AI, previously announcing the establishment of the "Large Model Technology Innovation Department" and planning to launch enterprise-level large model products.