Related papers: HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

URL: http://arxiv.org/abs/2508.08088v1
Date: Mon, 11 Aug 2025 15:31:47 GMT
Title: HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
Authors: Jiejun Tan, Zhicheng Dou, Yan Yu, Jiehan Cheng, Qiang Ju, Jian Xie, Ji-Rong Wen,
Abstract summary: We propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL.<n>At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains.<n>At the high level, a planner agent coordinates low-level agents and provides the final answer.
Score: 54.65565885083031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multi-source retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.

Related papers

DeepResearch-9K: A Challenging Benchmark Dataset of Deep-Research Agent [63.52637950356965]
DeepResearch-9K is a large-scale, challenging dataset for deep-research scenarios.<n>DeepResearch-R1 is an open-source training framework for deep-research agents.
arXiv Detail & Related papers (2026-03-01T15:36:10Z)
MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline [26.19213349415094]
We aim to develop a multimodal research agent capable of explicit reasoning and planning, multi-tool invocation, and cross-modal information synthesis.<n>We observe three main challenges in developing such agents: (1) scarcity of search-intensive multimodal QA data, (2) lack of effective search trajectories, and (3) prohibitive cost of training with online search APIs.<n>With the three designs, we develop MM-DeepResearch, a powerful multimodal deep research agent, and extensive results shows its superiority across benchmarks.
arXiv Detail & Related papers (2026-03-01T11:13:22Z)
Revisiting Text Ranking in Deep Research [24.324221566628125]
Black-box web search APIs hinder systematic analysis of search components.<n>We reproduce a selection of key findings and best practices for IR text ranking methods in the deep research setting.
arXiv Detail & Related papers (2026-02-25T00:18:07Z)
Search Self-play: Pushing the Frontier of Agent Capability without Supervision [14.889394507446477]
Self-play training for deep search agents is proposed in this paper.<n>In this search self-play (SSP) game, the proposer and the solver co-evolve their agent capabilities through both competition and cooperation.<n>SSP can significantly improve search agents' performance uniformly on various benchmarks without any supervision.
arXiv Detail & Related papers (2025-10-21T17:19:35Z)
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL [60.47878242100153]
We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>We apply end-to-end multi-turn reinforcement learning to enhance LLMs' long-horizon reasoning with deep search.
arXiv Detail & Related papers (2025-09-12T17:52:35Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework [73.91207117772291]
ManuSearch is a transparent and modular multi-agent framework designed to democratize deep search for large language models (LLMs)<n>ManuSearch decomposes the search and reasoning process into three collaborative agents: (1) a solution planning agent that iteratively formulates sub-queries, (2) an Internet search agent that retrieves relevant documents via real-time web search, and (3) a structured webpage reading agent that extracts key evidence from raw web content.
arXiv Detail & Related papers (2025-05-23T17:02:02Z)
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis [89.99161034065614]
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios.<n>Existing approaches face critical limitations that lack high-quality training trajectories and suffer from distributional mismatches.<n>This paper introduces SimpleDeepSearcher, a framework that bridges the gap through strategic data engineering rather than complex training paradigms.
arXiv Detail & Related papers (2025-05-22T16:05:02Z)
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [20.498100965239818]
We introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents.<n>Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web.<n>Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines.
arXiv Detail & Related papers (2025-04-04T04:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.