Related papers: DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL

URL: http://arxiv.org/abs/2509.10446v1
Date: Fri, 12 Sep 2025 17:52:35 GMT
Title: DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
Authors: Rui Lu, Zhenyu Hou, Zihan Wang, Hanchen Zhang, Xiao Liu, Yujiang Li, Shi Feng, Jie Tang, Yuxiao Dong,
Abstract summary: We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp.
Score: 60.47878242100153
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.

Related papers

IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning [54.21689544323704]
Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge.<n>Unlike real-time conversational assistants, DR is computationally expensive and time-consuming.<n>We propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research.
arXiv Detail & Related papers (2026-02-03T12:43:09Z)
Search Self-play: Pushing the Frontier of Agent Capability without Supervision [14.889394507446477]
Self-play training for deep search agents is proposed in this paper.<n>In this search self-play (SSP) game, the proposer and the solver co-evolve their agent capabilities through both competition and cooperation.<n>SSP can significantly improve search agents' performance uniformly on various benchmarks without any supervision.
arXiv Detail & Related papers (2025-10-21T17:19:35Z)
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search [61.77858432092777]
We present DeepMMSearch-R1, the first multimodal large language model capable of performing on-demand, multi-turn web searches.<n>DeepMMSearch-R1 can initiate web searches based on relevant crops of the input image making the image search more effective.<n>We conduct extensive experiments across a range of knowledge-intensive benchmarks to demonstrate the superiority of our approach.
arXiv Detail & Related papers (2025-10-14T17:59:58Z)
Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs [7.3517692707289415]
We introduce Fathom-DeepResearch, an agentic system composed of two specialized models.<n>The first is Fathom-Search-4B, a DeepSearch model optimized for evidence-based investigation through live web search and targeted webpage querying.<n>The second is Fathom- Synthesizer-4B, trained from Qwen3-4B, which converts multi-turn DeepSearch traces into structured, citation-dense DeepResearch Reports.
arXiv Detail & Related papers (2025-09-28T22:58:11Z)
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL [22.8456317506762]
ASearcher is an open-source project for large-scale RL training of search agents.<n>ASearcher-Web-QwQ achieves Avg@4 scores of 42.1 on xBench and 52.8 on GAIA, surpassing existing open-source 32B agents.
arXiv Detail & Related papers (2025-08-11T13:36:57Z)
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent [68.3311163530321]
Web agents such as Deep Research have demonstrated cognitive abilities, capable of solving highly challenging information-seeking problems.<n>This makes multimodal Deep Research highly challenging, as such agents require much stronger reasoning abilities in perception, logic, knowledge.<n>We introduce WebWatcher, a multi-modal Agent for Deep Research equipped with enhanced visual-language reasoning capabilities.
arXiv Detail & Related papers (2025-08-07T18:03:50Z)
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z)
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [5.280613615397194]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z)
DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning [73.68685269970844]
We introduce WebPuzzle, a training and 275-sample test benchmark that evaluates information seeking on the live internet.<n>We develop DeepDiver, a reinforcement-learning framework that cultivates Search Intensity Scaling (SIS)-an emergent ability to escalate search frequency and depth.<n>We detail DeepDiver's curriculum from cold-start SFT to a well designed RL procedure, and show that its seeking policy generalized from closed-ended queries to open-ended generation such as long-form writing.
arXiv Detail & Related papers (2025-05-30T08:15:39Z)
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization [14.931231544839687]
StepSearch is a framework for search LLMs that trained with step-wise proximal policy optimization method.<n>It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties.<n>On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models.
arXiv Detail & Related papers (2025-05-21T05:01:31Z)
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.