Making Agents Stronger with Use: AReaL 2.0 Open Source - Building a RL Infrastructure for Self-Evolving Intelligent Agents

On July 2, the open-source reinforcement learning infrastructure project AReaL officially released version 2.0. AReaL aims to bridge the gap between foundational model training and modern agent applications, providing efficient reinforcement learning training support for agent scenarios.

The newly released AReaL 2.0 is designed for agents that have entered real-world business scenarios, offering a system infrastructure that allows agents to continuously learn while in use. Through AReaL 2.0, the interaction processes generated by agents when completing real tasks can be recorded, organized, and integrated into subsequent training processes, used to continuously optimize the underlying models, enabling agents to become stronger while remaining safe and controllable.

Currently, agents are entering real production environments, writing code, searching for information, and calling tools to complete increasingly complex tasks within enterprise systems. However, a problem has emerged: agents work every day but rarely truly grow from their experiences.

In real business scenarios, agents generate a large amount of valuable experience: which tasks were completed well, where tool calls failed, why users were dissatisfied, whether a certain decision was on the wrong track. However, this information is mostly stored in log form, making it difficult to stably and safely transform into the next step of capability improvement.

AReaL 2.0 aims to solve the issue of how agents continue to grow after deployment. Developers do not need to redevelop agents; they just need to let the requests originally sent to large models by agents pass through AReaL 2.0's unified inference entrance to integrate into the online reinforcement learning process.

Figure: AReaL 2.0 Online Reinforcement Learning (Online RL) Architecture Diagram

Take Hermes Agent as an example. Hermes continues to receive tasks, plan steps, and call models as usual. AReaL 2.0 records the key interaction processes when Hermes completes tasks in the background and uses these real trajectories combined with feedback or reward signals after task completion for subsequent training. Developers can also replace Hermes with their own agent and task environment, using the same method to build an online reinforcement learning process for agents.

This means that an agent's ability improvement no longer relies solely on manually constructed data, offline training, and redeployment. Multi-turn conversations, tool calls, execution results, and feedback signals from real tasks can all become materials for model learning.

This point is especially important in enterprise scenarios. Agents in enterprise workflows face real, complex, and constantly changing tasks: code repositories may update, business processes may change, user needs may shift, and tools and systems may also change. If an agent's capabilities are fixed once deployed, it will struggle to adapt to the real environment over time. AReaL 2.0 aims to fill the missing link between "being able to use tools" and "being able to learn from usage."

At the same time, continuous learning in real business scenarios cannot simply be about "collecting data and retraining." Agents may access code, customer information, enterprise knowledge bases, and internal systems, so the training pipeline must consider requirements such as access control, data anonymization, isolation, and audit. AReaL 2.0 introduces a data proxy mechanism for agent traces in its system design, allowing real task data to be managed and used in a more secure and controllable way when entering the training process.

The AReaL team pointed out in their technical report that the key bottleneck for self-evolving agents is not just how strong the model is or whether the reinforcement learning algorithm is advanced, but the lack of an online reinforcement learning infrastructure that serves real agents. AReaL 2.0 is an architectural upgrade aimed at the next generation of agent applications: connecting agent services, real task traces, data governance, and online reinforcement learning training, giving agents the practical engineering foundation to continue learning after deployment.

From a longer-term perspective, AReaL 2.0 points to the evolution paradigm of the next generation of agent applications: agents are no longer one-time trained and deployed tools, but rather continuously gain feedback in real environments, transforming both success and failure into experience, and continuously improving their capabilities within a safe boundary.

The AReaL project was initiated by teams including Ant Group, Tsinghua University, and the Hong Kong University of Science and Technology in 2024. In May 2026, AReaL officially exited the Ant InclusionAI incubation program and became an independent open-source community, joining the PyTorch Foundation Ecosystem project, further integrating into the mainstream reinforcement learning infrastructure ecosystem.

As the community develops independently, AReaL has also continued to receive participation and support from industry and open-source ecological partners, including the Huawei Cloud team and MindLab. In the future, AReaL will continue to iterate around areas such as online reinforcement learning, automated evaluation, and multimodal agent training, working with the community to advance the development of self-evolving agent ecosystems.

Currently, the AReaL 2.0 technical report and code have been open-sourced.

· GitHub Repository: https://github.com/areal-project/AReaL

· Technical Report: https://arxiv.org/abs/2607.01120

Making Agents Stronger with Use: AReaL 2.0 Open Source - Building a RL Infrastructure for Self-Evolving Intelligent Agents

Related Recommendations

The Next Puzzle Piece in AI Evolution: GPT-5.6 May Launch Next Week, Focusing on Agent-Level Operational Capabilities

WeChat Pay Officially Launches AI Dedicated Card: Supports Agent Closed-Loop Consumption, Main Account Fully Isolated

The Game of Capabilities and Security! OpenAI Launches ChatGPT Block Mode, Willing to Cut Off the Internet to Prevent Data Leaks

Tencent Meeting Upgrades Multiple AI Features, Baobao Minutes Monthly Usage Time Increases Nearly 5 Times

Cozy 3.0 Officially Launched, Supporting Multi-Person and Multi-Agent Collaboration