DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling
- URL: http://arxiv.org/abs/2510.21712v1
- Date: Sun, 07 Sep 2025 13:45:09 GMT
- Title: DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling
- Authors: Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang,
- Abstract summary: We propose DecoupleSearch, a framework that decouples planning and search processes using dual value models.<n>Our approach constructs a reasoning tree, where each node represents planning and search steps.<n>During inference, Hierarchical Beam Search iteratively refines planning and search candidates with dual value models.
- Score: 56.45844907505722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-Augmented Generation (RAG) systems have emerged as a pivotal methodology for enhancing Large Language Models (LLMs) through the dynamic integration of external knowledge. To further improve RAG's flexibility, Agentic RAG introduces autonomous agents into the workflow. However, Agentic RAG faces several challenges: (1) the success of each step depends on both high-quality planning and accurate search, (2) the lack of supervision for intermediate reasoning steps, and (3) the exponentially large candidate space for planning and searching. To address these challenges, we propose DecoupleSearch, a novel framework that decouples planning and search processes using dual value models, enabling independent optimization of plan reasoning and search grounding. Our approach constructs a reasoning tree, where each node represents planning and search steps. We leverage Monte Carlo Tree Search to assess the quality of each step. During inference, Hierarchical Beam Search iteratively refines planning and search candidates with dual value models. Extensive experiments across policy models of varying parameter sizes, demonstrate the effectiveness of our method.
Related papers
- WideSeek: Advancing Wide Research via Multi-Agent Scaling [29.02742625120584]
Wide Research is a paradigm essential for synthesizing and synthesizing comprehensive information under complex constraints in parallel.<n>We take a deep dive into Wide Research from two perspectives: Data Pipeline and Agent Optimization.<n>First, we produce WideSeekBench, a benchmark constructed via a rigorous multi-phase data pipeline to ensure diversity across the target information volume.<n>Second, we introduce WideSeek, a dynamic hierarchical multi-agent architecture that can autonomously fork parallel sub-agents based on task requirements.
arXiv Detail & Related papers (2026-02-02T18:32:48Z) - AI Founding Fathers: A Case Study of GIS Search in Multi-Agent Pipelines [0.0]
Large Language Models (LLMs) show exceptional fluency, but efforts persist to extract stronger reasoning capabilities from them.<n>This paper advances a systematic framework for understanding LLM reasoning and optimization.
arXiv Detail & Related papers (2025-11-12T05:52:55Z) - Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey [92.71325249013535]
Deliberative tree search is a cornerstone of Large Language Model (LLM) research.<n>This paper introduces a unified framework that deconstructs search algorithms into three core components.
arXiv Detail & Related papers (2025-10-11T03:29:18Z) - AI-SearchPlanner: Modular Agentic Search via Pareto-Optimal Multi-Objective Reinforcement Learning [7.913125061214038]
We propose textbfAI-SearchPlanner, a novel reinforcement learning framework designed to enhance the performance of frozen QA models by focusing on search planning.<n>Experiments on real-world datasets demonstrate that AI SearchPlanner outperforms existing RL-based search agents in both effectiveness and efficiency.
arXiv Detail & Related papers (2025-08-28T02:31:17Z) - Efficient Agent: Optimizing Planning Capability for Multimodal Retrieval Augmented Generation [17.115587821286223]
Multimodal Retrieval-Augmented Generation (mRAG) has emerged as a promising solution to address the temporal limitations of Multimodal Large Language Models (MLLMs) in real-world scenarios.<n>We propose E-Agent, an agent framework featuring two key innovations: a mRAG planner trained to dynamically orchestrate multimodal tools based on contextual reasoning and a task executor employing tool-aware execution sequencing.
arXiv Detail & Related papers (2025-08-12T10:17:12Z) - HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search [85.12447821237045]
HiRA is a hierarchical framework that separates strategic planning from specialized execution.<n>Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities.<n> Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.
arXiv Detail & Related papers (2025-07-03T14:18:08Z) - MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z) - HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking [109.09735490692202]
We propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning.<n> Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1.5-Pro, resulting in a 3.6 times performance improvement over o1-preview.
arXiv Detail & Related papers (2025-05-05T02:38:58Z) - Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z) - AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP
Investigation in the Recommender System [32.288429300824454]
We introduce AMEIR for Automatic behavior Modeling, interaction Exploration and multi-layer perceptron (MLP) Investigation in the Recommender system.
Specifically, AMEIR divides the complete recommendation models into three stages of behavior modeling, interaction exploration, aggregation, and introduces a novel search space containing three subspaces.
To find the ideal architecture efficiently and effectively, AMEIR realizes the one-shot random search in recommendation progressively on the three stages and assembles the search results as the final outcome.
arXiv Detail & Related papers (2020-06-10T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.