Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
- URL: http://arxiv.org/abs/2505.24332v1
- Date: Fri, 30 May 2025 08:15:39 GMT
- Title: Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
- Authors: Wenxuan Shi, Haochen Tan, Chuqiao Kuang, Xiaoguang Li, Xiaozhe Ren, Chen Zhang, Hanting Chen, Yasheng Wang, Lifeng Shang, Fisher Yu, Yunhe Wang,
- Abstract summary: Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering.<n>Existing methods rely on static prompting rules or training with Wikipedia-based corpora and retrieval environments.<n>We introduce WebPuzzle, the first dataset designed to foster information-seeking behavior in open-world internet environments.
- Score: 79.26661332815465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering. Existing methods rely on static prompting rules or training with Wikipedia-based corpora and retrieval environments, limiting adaptability to the real-world web environment where ambiguity, conflicting evidence, and noise are prevalent. These constrained training settings hinder LLMs from learning to dynamically decide when and where to search, and how to adjust search depth and frequency based on informational demands. We define this missing capacity as Search Intensity Scaling (SIS)--the emergent skill to intensify search efforts under ambiguous or conflicting conditions, rather than settling on overconfident, under-verification answers. To study SIS, we introduce WebPuzzle, the first dataset designed to foster information-seeking behavior in open-world internet environments. WebPuzzle consists of 24K training instances and 275 test questions spanning both wiki-based and open-web queries. Building on this dataset, we propose DeepDiver, a Reinforcement Learning (RL) framework that promotes SIS by encouraging adaptive search policies through exploration under a real-world open-web environment. Experimental results show that Pangu-7B-Reasoner empowered by DeepDiver achieve performance on real-web tasks comparable to the 671B-parameter DeepSeek-R1. We detail DeepDiver's training curriculum from cold-start supervised fine-tuning to a carefully designed RL phase, and present that its capability of SIS generalizes from closed-form QA to open-ended tasks such as long-form writing. Our contributions advance adaptive information seeking in LLMs and provide a valuable benchmark and dataset for future research.
Related papers
- WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z) - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [89.99161034065614]
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios.<n>Existing approaches face critical limitations that lack high-quality training trajectories and suffer from distributional mismatches.<n>This paper introduces SimpleDeepSearcher, a framework that bridges the gap through strategic data engineering rather than complex training paradigms.
arXiv Detail & Related papers (2025-05-22T16:05:02Z) - InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z) - Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging [7.047640531842663]
InForage is a reinforcement learning framework that formalizes retrieval-augmented reasoning as a dynamic information-seeking process.<n>We construct a human-guided dataset capturing iterative search and reasoning trajectories for complex, real-world web tasks.<n>These results highlight InForage's effectiveness in building robust, adaptive, and efficient reasoning agents.
arXiv Detail & Related papers (2025-05-14T12:13:38Z) - WebThinker: Empowering Large Reasoning Models with Deep Research Capability [60.81964498221952]
WebThinker is a deep research agent that empowers large reasoning models to autonomously search the web, navigate web pages, and draft research reports during the reasoning process.<n>It also employs an textbfAutonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.<n>Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z) - DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [20.498100965239818]
We introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents.<n>Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web.<n>Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines.
arXiv Detail & Related papers (2025-04-04T04:41:28Z) - Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract) [73.57710917145212]
Learning to rank is widely employed in web searches to prioritize pertinent webpages based on input queries.
We propose a emphulineGenerative ulineSemi-ulineSupervised ulinePre-trained (GS2P) model to address these challenges.
We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine.
arXiv Detail & Related papers (2024-09-25T03:39:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.