Recently, StepFun officially launched a new deep research intelligent agent model - Step-DeepResearch. This 32B parameter model aims to transform traditional web search into a more professional and in-depth research workflow, capable of handling complex tasks such as long-range reasoning, tool calls, and structured report writing.

image.png

Different from common Web Agents on the market that are mainly optimized for short questions, Step-DeepResearch focuses more on real research and analysis scenarios. It can identify potential search intentions, perform multi-source verification when facing uncertainty, and ultimately produce professional reports with citations. The StepFun team stated that this model is built upon Qwen2.532B-Base, effectively reducing the cost of reasoning by internalizing the research process into a single agent's decision-making process.

In order for AI to achieve the research level of human experts, Step-DeepResearch has focused on refining four "atomic capabilities": planning and task decomposition, deep information acquisition, reflection and verification, and professional report generation. During training, the team built a large synthetic data pipeline using high-quality technical reports, financial documents, and knowledge graph data, giving it high stability when handling long-term projects.

Currently, the model has achieved a compliance rate of 61.42% in Scale AI's research evaluation metrics, performing well enough to rival the deep research systems of OpenAI and Google. In StepFun's own ADR-Bench Chinese benchmark test, this 32B model even surpassed some larger-scale open-source models, demonstrating high practical value and cost advantages.

Paper: https://arxiv.org/pdf/2512.20491

Key Points:

  • 🧠 Single Agent Architecture: Step-DeepResearch internalizes planning, searching, verifying, and writing as atomic capabilities of a single model, without needing to call multiple external agents, significantly improving efficiency and reducing costs.

  • 📚 Deep Research Orientation: Unlike simple question-and-answer retrieval, this model supports a context length of up to 128k, enabling it to retrieve information from over 20 million papers and authoritative indexes, generating rigorous structured reports.

  • 🏆 Strong Performance: It performs well in multiple deep research evaluations, achieving professional research standards comparable to large-parameter closed-source models with its 32B size.