DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
- URL: http://arxiv.org/abs/2510.20168v1
- Date: Thu, 23 Oct 2025 03:28:45 GMT
- Title: DeepWideSearch: Benchmarking Depth and Width in Agentic Information Seeking
- Authors: Tian Lan, Bin Zhu, Qianghuai Jia, Junyang Ren, Haijun Li, Longyue Wang, Zhao Xu, Weihua Luo, Kaifu Zhang,
- Abstract summary: DeepWideSearch is the first benchmark designed to evaluate agents to integrate depth and width in information seeking.<n>In DeepWideSearch, agents must process a large volume of data, each requiring deep reasoning over multi-hop retrieval paths.<n>Experiments demonstrate that even state-of-the-art agents achieve only 2.39% average success rate.
- Score: 42.413184411326164
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current search agents fundamentally lack the ability to simultaneously perform \textit{deep} reasoning over multi-hop retrieval and \textit{wide}-scale information collection-a critical deficiency for real-world applications like comprehensive market analysis and business development. To bridge this gap, we introduce DeepWideSearch, the first benchmark explicitly designed to evaluate agents to integrate depth and width in information seeking. In DeepWideSearch, agents must process a large volume of data, each requiring deep reasoning over multi-hop retrieval paths. Specifically, we propose two methods to converse established datasets, resulting in a curated collection of 220 questions spanning 15 diverse domains. Extensive experiments demonstrate that even state-of-the-art agents achieve only 2.39% average success rate on DeepWideSearch, highlighting the substantial challenge of integrating depth and width search in information-seeking tasks. Furthermore, our error analysis reveals four failure modes: lack of reflection, overreliance on internal knowledge, insufficient retrieval, and context overflow-exposing key limitations in current agent architectures. We publicly release DeepWideSearch to catalyze future research on more capable and robust information-seeking agents.
Related papers
- Revisiting Text Ranking in Deep Research [24.324221566628125]
Black-box web search APIs hinder systematic analysis of search components.<n>We reproduce a selection of key findings and best practices for IR text ranking methods in the deep research setting.
arXiv Detail & Related papers (2026-02-25T00:18:07Z) - DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents [10.197402632091551]
DeepSearchQA is a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks.<n>This dataset is designed to evaluate an agent's ability to execute complex search plans to generate exhaustive answer lists.
arXiv Detail & Related papers (2026-01-28T19:20:47Z) - Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z) - DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL [60.47878242100153]
We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>We apply end-to-end multi-turn reinforcement learning to enhance LLMs' long-horizon reasoning with deep search.
arXiv Detail & Related papers (2025-09-12T17:52:35Z) - HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches [54.65565885083031]
We propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL.<n>At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains.<n>At the high level, a planner agent coordinates low-level agents and provides the final answer.
arXiv Detail & Related papers (2025-08-11T15:31:47Z) - Benchmarking Deep Search over Heterogeneous Enterprise Data [73.55304268238474]
We present a new benchmark for evaluating a form of retrieval-augmented generation (RAG)<n>RAG requires source-aware, multi-hop reasoning over diverse, sparsed, but related sources.<n>We build it using a synthetic data pipeline that simulates business across product planning, development, and support stages.
arXiv Detail & Related papers (2025-06-29T08:34:59Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.