Related papers: R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

URL: http://arxiv.org/abs/2506.04185v1
Date: Wed, 04 Jun 2025 17:29:22 GMT
Title: R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
Authors: Qingfei Zhao, Ruobing Wang, Dingling Xu, Daren Zha, Limin Liu,
Abstract summary: R-Search is a reinforcement learning framework for Reasoning-Search integration.<n>It guides large language models to autonomously execute multi-step reasoning with deep search interaction.<n>R-Search learns optimal reasoning search interaction trajectories via multi-reward signals.
Score: 0.8388591755871735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-stage, multi-type rewards to jointly optimize the reasoning-search trajectory. Experiments on seven datasets show that R-Search outperforms advanced RAG baselines by up to 32.2% (in-domain) and 25.1% (out-of-domain). The code and data are available at https://github.com/QingFei1/R-Search.

Related papers

Beneficial Reasoning Behaviors in Agentic Search and Effective Post-training to Obtain Them [23.986035712600657]
We propose a reasoning-driven pipeline to study effective reasoning behavior patterns in agentic search.<n>We identify four beneficial reasoning behaviors: Information Verification, Authority Evaluation, Adaptive Search, and Error Recovery.<n>We show that behavior priming yields over 35% gains in Llama3.2-3B and Qwen3-1.7B compared to directly training agentic search models with RL.
arXiv Detail & Related papers (2025-10-08T00:20:35Z)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection [55.125987985864896]
We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors.<n>We propose a simple yet effective approach to instantiate a search agent, RE-Searcher.<n>This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments.
arXiv Detail & Related papers (2025-09-30T10:25:27Z)
DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [4.817888539036794]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z)
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z)
MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
SEM: Reinforcement Learning for Search-Efficient Large Language Models [26.075903427834838]
Large Language Models (LLMs) have demonstrated their capabilities not only in reasoning but also in invoking external tools.<n>Existing reinforcement learning approaches often lead to redundant search behaviors, resulting in inefficiencies and over-cost.<n>We propose SEM, a novel post-training reinforcement learning framework that explicitly trains LLMs to optimize search usage.
arXiv Detail & Related papers (2025-05-12T09:45:40Z)
ZeroSearch: Incentivize the Search Capability of LLMs without Searching [69.55482019211597]
We introduce ZeroSearch, a framework that incentivizes the capabilities of large language models to use a real search engine with simulated searches during training.<n>Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents.
arXiv Detail & Related papers (2025-05-07T17:30:22Z)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning [37.183397387416065]
We propose ReSearch, a framework that trains LLMs to Reason with Search via reinforcement learning.<n>Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking.<n>Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction.
arXiv Detail & Related papers (2025-03-25T09:00:58Z)
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [50.419872452397684]
Search-R1 is an extension of reinforcement learning for reasoning frameworks.<n>It generates search queries during step-by-step reasoning with real-time retrieval.<n>It improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines.
arXiv Detail & Related papers (2025-03-12T16:26:39Z)
Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z)
PathFinder: Guided Search over Multi-Step Reasoning Paths [80.56102301441899]
We propose PathFinder, a tree-search-based reasoning path generation approach. It enhances diverse branching and multi-hop reasoning through the integration of dynamic decoding. Our model generalizes well to longer, unseen reasoning chains, reflecting similar complexities to beam search with large branching factors.
arXiv Detail & Related papers (2023-12-08T17:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.