DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- URL: http://arxiv.org/abs/2504.03160v4
- Date: Thu, 17 Apr 2025 04:46:08 GMT
- Title: DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
- Authors: Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu,
- Abstract summary: We introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents.<n>Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web.<n>Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines.
- Score: 20.498100965239818
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually engineered prompts (prompt engineering-based) with brittle performance or reinforcement learning within controlled Retrieval-Augmented Generation (RAG) environments (RAG-based) that fail to capture the complexities of real-world interaction. In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web. We implement a specialized multi-agent architecture where browsing agents extract relevant information from various webpage structures and overcoming significant technical challenges. Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications. We release DeepResearcher at https://github.com/GAIR-NLP/DeepResearcher.
Related papers
- WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents [30.768405850755602]
DeepResearch Bench is a benchmark consisting of 100 PhD-level research tasks.<n> evaluating Deep Research Agents is inherently complex and labor-intensive.<n>We propose two novel methodologies that achieve strong alignment with human judgment.
arXiv Detail & Related papers (2025-06-13T13:17:32Z) - Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning [79.26661332815465]
Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering.<n>Existing methods rely on static prompting rules or training with Wikipedia-based corpora and retrieval environments.<n>We introduce WebPuzzle, the first dataset designed to foster information-seeking behavior in open-world internet environments.
arXiv Detail & Related papers (2025-05-30T08:15:39Z) - WebDancer: Towards Autonomous Information Seeking Agency [69.33360019344083]
Recent progress in agentic systems underscores the potential for autonomous multi-step research.<n>We present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective.<n>We instantiate this framework in a web agent based on the ReAct, WebDancer.
arXiv Detail & Related papers (2025-05-28T17:57:07Z) - SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [89.99161034065614]
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios.<n>Existing approaches face critical limitations that lack high-quality training trajectories and suffer from distributional mismatches.<n>This paper introduces SimpleDeepSearcher, a framework that bridges the gap through strategic data engineering rather than complex training paradigms.
arXiv Detail & Related papers (2025-05-22T16:05:02Z) - WebThinker: Empowering Large Reasoning Models with Deep Research Capability [60.81964498221952]
WebThinker is a deep research agent that empowers large reasoning models to autonomously search the web, navigate web pages, and draft research reports during the reasoning process.
It also employs an textbfAutonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.
Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z) - Deep Reinforcement Learning for Automated Web GUI Testing [13.62121897768763]
WebRLED is an effective approach for automated GUI testing of complex web applications.
WebRLED achieves higher code/state coverage and failure detection rate compared to existing state-of-the-art (SOTA) techniques.
arXiv Detail & Related papers (2025-04-27T13:42:30Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence [2.890656584329591]
Online Decision MetaMorphFormer (ODM) aims to achieve self-awareness, environment recognition, and action planning.
ODM can be applied to any arbitrary agent with a multi-joint body, located in different environments, and trained with different types of tasks using large-scale pre-trained datasets.
arXiv Detail & Related papers (2024-09-11T15:22:43Z) - WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs [10.380692079063467]
We propose WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system.
First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval.
Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process.
arXiv Detail & Related papers (2024-08-14T15:19:16Z) - Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning [53.3760591018817]
We propose a new benchmarking environment for aquatic navigation using recent advances in the integration between game engines and Deep Reinforcement Learning.
Specifically, we focus on PPO, one of the most widely accepted algorithms, and we propose advanced training techniques.
Our empirical evaluation shows that a well-designed combination of these ingredients can achieve promising results.
arXiv Detail & Related papers (2024-05-30T23:20:23Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open
Problems [0.0]
Reinforcement learning (RL) has experienced a dramatic increase in popularity.
There is still a wide range of domains inaccessible to RL due to the high cost and danger of interacting with the environment.
offline RL is a paradigm that learns exclusively from static datasets of previously collected interactions.
arXiv Detail & Related papers (2022-03-02T20:05:11Z) - How to Train Your Robot with Deep Reinforcement Learning; Lessons We've
Learned [111.06812202454364]
We present a number of case studies involving robotic deep RL.
We discuss commonly perceived challenges in deep RL and how they have been addressed in these works.
We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting.
arXiv Detail & Related papers (2021-02-04T22:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.