In the current large language model (LLM) field, deep search capabilities have become the "ultimate move" of top intelligent agents. However, this competition has long been dominated by industrial giants with substantial resources. Traditional development models typically rely on resource-intensive pipelines, including pre-training, continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL).
A research team from academia recently released their latest achievement

The team proposed three core optimization strategies in data synthesis: first, expanding the knowledge graph scale to provide a richer exploration space; second, significantly increasing the number of toolkits to extend functional boundaries; finally, implementing strict low-step filtering to ensure the refinement and efficiency of training data.
Experimental data shows that

Notably, this is the first state-of-the-art (SOTA) search agent developed by a purely academic team using only SFT technology at the same model scale and architecture. Currently, the team has officially open-sourced the model weights of
Paper URL: https://arxiv.org/pdf/2605.04036
