Related papers: Tongyi DeepResearch Technical Report

Tongyi DeepResearch Technical Report

URL: http://arxiv.org/abs/2510.24701v2
Date: Tue, 04 Nov 2025 10:23:45 GMT
Title: Tongyi DeepResearch Technical Report
Authors: Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Gang Fu, Haiyang Shen, Jiayin Yang, Jun Lin, Junkai Zhang, Kui Zeng, Li Yang, Hailong Yin, Maojia Song, Ming Yan, Minpeng Liao, Peng Xia, Qian Xiao, Rui Min, Ruixue Ding, Runnan Fang, Shaowei Chen, Shen Huang, Shihang Wang, Shihao Cai, Weizhou Shen, Xiaobin Wang, Xin Guan, Xinyu Geng, Yingcheng Shi, Yuning Wu, Zhuo Chen, Zijian Li, Yong Jiang,
Abstract summary: To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework.<n>Tongyi DeepResearch achieves 30.5 billion total parameters, with only 3.3 billion activated per token.<n>We open-source the model, framework, and complete solutions to empower the community.
Score: 111.78446943571782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.

Related papers

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent [63.52637950356965]
DeepResearch-9K is a large-scale, challenging dataset for deep-research scenarios.<n>DeepResearch-R1 is an open-source training framework for deep-research agents.
arXiv Detail & Related papers (2026-03-01T15:36:10Z)
Step-DeepResearch Technical Report [90.50586290399683]
We introduce Step-DeepResearch, a cost-effective, end-to-end agent.<n>We propose a Data Synthesis Strategy Based on Atomic Capabilities to reinforce planning and report writing.<n>To bridge the evaluation gap in the Chinese domain, we establish ADR-Bench for realistic deep research scenarios.
arXiv Detail & Related papers (2025-12-23T16:32:27Z)
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild [86.6586720134927]
LiveResearchBench is a benchmark of 100 expert-curated tasks spanning daily life, enterprise, and academia.<n>DeepEval is a comprehensive suite covering both content- and report-level quality.<n>Our analysis reveals current strengths, recurring failure modes, and key system components needed to advance reliable, insightful deep research.
arXiv Detail & Related papers (2025-10-16T02:49:16Z)
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z)
Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z)
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge [34.672897171399775]
Agentic search systems autonomously browse the web, synthesize information, and return comprehensive citation-backed answers.<n>Mind2Web 2 is a benchmark of 130 realistic, high-quality, and long-horizon tasks constructed with over 1000 hours of human labor.<n>Our method constructs task-specific judge agents based on a tree-structured design to automatically assess both answer correctness and source attribution.
arXiv Detail & Related papers (2025-06-26T17:32:50Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents [30.768405850755602]
DeepResearch Bench is a benchmark consisting of 100 PhD-level research tasks.<n> evaluating Deep Research Agents is inherently complex and labor-intensive.<n>We propose two novel methodologies that achieve strong alignment with human judgment.
arXiv Detail & Related papers (2025-06-13T13:17:32Z)
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [20.498100965239818]
We introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents.<n>Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web.<n>Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines.
arXiv Detail & Related papers (2025-04-04T04:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.