DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
- URL: http://arxiv.org/abs/2509.10446v1
- Date: Fri, 12 Sep 2025 17:52:35 GMT
- Title: DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
- Authors: Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong,
- Abstract summary: We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp.
- Score: 60.47878242100153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.
Related papers
- IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning [54.21689544323704]
Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge.<n>Unlike real-time conversational assistants, DR is computationally expensive and time-consuming.<n>We propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research.
arXiv Detail & Related papers (2026-02-03T12:43:09Z) - Search Self-play: Pushing the Frontier of Agent Capability without Supervision [14.889394507446477]
Self-play training for deep search agents is proposed in this paper.<n>In this search self-play (SSP) game, the proposer and the solver co-evolve their agent capabilities through both competition and cooperation.<n>SSP can significantly improve search agents' performance uniformly on various benchmarks without any supervision.
arXiv Detail & Related papers (2025-10-21T17:19:35Z) - DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search [61.77858432092777]
We present DeepMMSearch-R1, the first multimodal large language model capable of performing on-demand, multi-turn web searches.<n>DeepMMSearch-R1 can initiate web searches based on relevant crops of the input image making the image search more effective.<n>We conduct extensive experiments across a range of knowledge-intensive benchmarks to demonstrate the superiority of our approach.
arXiv Detail & Related papers (2025-10-14T17:59:58Z) - Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z) - Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL [22.8456317506762]
ASearcher is an open-source project for large-scale RL training of search agents.<n>ASearcher-Web-QwQ achieves Avg@4 scores of 42.1 on xBench and 52.8 on GAIA, surpassing existing open-source 32B agents.
arXiv Detail & Related papers (2025-08-11T13:36:57Z) - WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent [68.3311163530321]
Web agents such as Deep Research have demonstrated cognitive abilities, capable of solving highly challenging information-seeking problems.<n>This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge.<n>We introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities.
arXiv Detail & Related papers (2025-08-07T18:03:50Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [5.280613615397194]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z) - DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning [73.68685269970844]
We introduce WebPuzzle, a training and 275-sample test benchmark that evaluates information seeking on the live internet.<n>We develop DeepDiver, a reinforcement-learning framework that cultivates Search Intensity Scaling (SIS)-an emergent ability to escalate search frequency and depth.<n>We detail DeepDiver's curriculum from cold-start SFT to a well designed RL procedure, and show that its seeking policy generalized from closed-ended queries to open-ended generation such as long-form writing.
arXiv Detail & Related papers (2025-05-30T08:15:39Z) - StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization [14.931231544839687]
StepSearch is a framework for search LLMs that trained with step-wise proximal policy optimization method.<n>It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties.<n>On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models.
arXiv Detail & Related papers (2025-05-21T05:01:31Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.