Recently, the Tencent Hunyuan team, in collaboration with the College of Artificial Intelligence at Renmin University's Gaoqing Institute and other institutions, jointly launched and open-sourced PlanningBench. This is an extensible and verifiable data generation framework aimed at evaluating and training the planning capabilities of large language models.

PlanningBench starts from real planning scenarios, systematically abstracting factors such as tasks, constraints, and difficulty levels, and builds a data generation and validation system covering more than 30 types of planning tasks. This framework not only can evaluate whether a model has planning capabilities, but also provides stable and transferable reward signals for training the model's planning abilities.
In terms of specific tasks, PlanningBench covers six categories of tasks: scheduling, resource allocation, staff scheduling, route scheduling, production operations, and emergency services. This wide range of task types avoids the "question drilling" phenomenon where models only perform well in a single field, enabling models to better handle diverse real-world applications.
Additionally, PlanningBench's difficulty control system decomposes factors such as task structure, constraint hierarchy, and resource tightness, allowing data generation to be adjusted around real challenges rather than simply extending prompt content. Each data instance is also equipped with a checklist to assess whether the model's output meets input conditions, resource constraints, and goal optimality.
Notably, PlanningBench simultaneously focuses on local compliance and global success evaluation methods, capable of identifying plans that "appear mostly correct but are overall unexecutable." This is of great significance for diagnosing the true planning capabilities of large language models under complex constraints.
Training with PlanningBench's verifiable data significantly improves the model's performance on unseen planning benchmarks and general tasks, demonstrating the generality of its learning signal. Overall, PlanningBench has formed a closed-loop generation and training transfer system driven by real scenarios, providing new tools and directions for future artificial intelligence planning research.
Key Points:
🌟 PlanningBench is an open-source framework developed jointly by Tencent and Renmin University's Gaoqing Institute, aimed at evaluating and training the planning capabilities of large language models.
📅 The framework covers more than 30 types of planning tasks, involving six practical application categories such as scheduling and resource allocation.
✅ Training with verifiable data significantly improves the model's performance across different tasks, demonstrating its broad applicability and transferability.
