Related papers: Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents

Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents

URL: http://arxiv.org/abs/2510.04695v1
Date: Mon, 06 Oct 2025 11:09:45 GMT
Title: Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents
Authors: Yiding Wang, Zhepei Wei, Xinyu Zhu, Yu Meng,
Abstract summary: We introduce DeSA (Decoupling Search-and-Answering), a simple two-stage training framework that explicitly separates search optimization from answer generation.<n>Across seven QA benchmarks, DeSA-trained agents consistently improve search behaviors, delivering substantially higher search recall and answer accuracy than outcome-only baselines.
Score: 19.31471304268234
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enabling large language models (LLMs) to utilize search tools offers a promising path to overcoming fundamental limitations such as knowledge cutoffs and hallucinations. Recent work has explored reinforcement learning (RL) for training search-augmented agents that interleave reasoning and retrieval before answering. These approaches usually rely on outcome-based rewards (e.g., exact match), implicitly assuming that optimizing for final answers will also yield effective intermediate search behaviors. Our analysis challenges this assumption: we uncover multiple systematic deficiencies in search that arise under outcome-only training and ultimately degrade final answer quality, including failure to invoke tools, invalid queries, and redundant searches. To address these shortcomings, we introduce DeSA (Decoupling Search-and-Answering), a simple two-stage training framework that explicitly separates search optimization from answer generation. In Stage 1, agents are trained to improve search effectiveness with retrieval recall-based rewards. In Stage 2, outcome rewards are employed to optimize final answer generation. Across seven QA benchmarks, DeSA-trained agents consistently improve search behaviors, delivering substantially higher search recall and answer accuracy than outcome-only baselines. Notably, DeSA outperforms single-stage training approaches that simultaneously optimize recall and outcome rewards, underscoring the necessity of explicitly decoupling the two objectives.

Related papers

Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z)
Over-Searching in Search-Augmented Large Language Models [22.821710825732563]
Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval.<n>Over-searching leads to computational inefficiency and hallucinations by incorporating irrelevant context.<n>Our finding shows: (i) search generally improves answer accuracy on answerable queries but harms abstention on unanswerable ones; (ii) over-searching is more pronounced in complex reasoning models and deep research systems; and (iii) the composition of retrieved evidence is crucial, as the presence of negative evidence improves abstention.
arXiv Detail & Related papers (2026-01-09T03:24:46Z)
SmartSearch: Process Reward-Guided Query Refinement for Search Agents [63.46067892354375]
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems.<n>Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked.<n>We introduce SmartSearch, a framework built upon two key mechanisms to mitigate this issue.
arXiv Detail & Related papers (2026-01-08T12:39:05Z)
AdaSearch: Balancing Parametric Knowledge and Search in Large Language Models via Reinforcement Learning [61.974530499621274]
Overreliance on search introduces unnecessary cost and risks exposure to noisy or malicious content.<n>We propose a two-stage, outcome-driven RL framework that disentangles problem solving from the decision of whether to invoke search.<n>AdaSearch substantially improves knowledge-boundary awareness, reduces unnecessary search calls, preserves strong task performance, and offers more transparent, interpretable decision behaviors.
arXiv Detail & Related papers (2025-12-18T18:50:01Z)
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z)
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation [21.72639961371058]
We introduce a comprehensive evaluation framework for evaluating RL-based search agents.<n>To foster faithful reasoning, we introduce VERITAS, a novel framework that integrates fine-grained faithfulness rewards into the reinforcement learning process.
arXiv Detail & Related papers (2025-10-15T08:17:52Z)
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation [21.08814504507274]
suboptimal search behaviors exist widely, such as over-search and under-search.<n>Current training methods, which typically rely on outcome-based rewards in a RL framework, lack the fine-grained control needed to address these inefficiencies.<n>We introduce HiPRAG, a training methodology that incorporates a fine-grained, knowledge-grounded process reward into the RL training.
arXiv Detail & Related papers (2025-10-09T05:13:10Z)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection [55.125987985864896]
We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors.<n>We propose a simple yet effective approach to instantiate a search agent, RE-Searcher.<n>This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments.
arXiv Detail & Related papers (2025-09-30T10:25:27Z)
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play [45.02121903138421]
AceSearcher trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation.<n>Experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%.
arXiv Detail & Related papers (2025-09-29T02:14:30Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents [9.862334188345791]
Large Language Model (LLM)-based search agents have shown remarkable capabilities in solving complex tasks.<n>We introduce SearchAgent-X, a high-efficiency inference framework for LLM-based search agents.<n>SearchAgent-X consistently outperforms state-of-the-art systems such as vLLM and HNSW-based retrieval.
arXiv Detail & Related papers (2025-05-17T16:07:01Z)
SEM: Reinforcement Learning for Search-Efficient Large Language Models [26.075903427834838]
Large Language Models (LLMs) have demonstrated their capabilities not only in reasoning but also in invoking external tools.<n>Existing reinforcement learning approaches often lead to redundant search behaviors, resulting in inefficiencies and over-cost.<n>We propose SEM, a novel post-training reinforcement learning framework that explicitly trains LLMs to optimize search usage.
arXiv Detail & Related papers (2025-05-12T09:45:40Z)
ZeroSearch: Incentivize the Search Capability of LLMs without Searching [69.55482019211597]
We introduce ZeroSearch, a framework that incentivizes the capabilities of large language models to use a real search engine with simulated searches during training.<n>Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents.
arXiv Detail & Related papers (2025-05-07T17:30:22Z)
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning [74.65632662894086]
We propose ReSearch, a framework that trains LLMs to Reason with Search via reinforcement learning.<n>Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking.<n>Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction.
arXiv Detail & Related papers (2025-03-25T09:00:58Z)
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search [50.45155830888697]
We develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tree search MCTS* for collecting higher-quality reasoning traces as well as per-step value to train policy and reward models. We first show that the tree-search policy in ReST-MCTS* achieves higher accuracy compared with prior LLM reasoning baselines such as Best-of-N and Tree-of-Thought, within the same search budget.
arXiv Detail & Related papers (2024-06-06T07:40:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.