s3: You Don't Need That Much Data to Train a Search Agent via RL
- URL: http://arxiv.org/abs/2505.14146v1
- Date: Tue, 20 May 2025 09:53:56 GMT
- Title: s3: You Don't Need That Much Data to Train a Search Agent via RL
- Authors: Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han,
- Abstract summary: Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference.<n>We propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward.
- Score: 41.21029905607559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.
Related papers
- MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z) - ImpRAG: Retrieval-Augmented Generation with Implicit Queries [49.510101132093396]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization [14.931231544839687]
StepSearch is a framework for search LLMs that trained with step-wise proximal policy optimization method.<n>It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties.<n>On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models.
arXiv Detail & Related papers (2025-05-21T05:01:31Z) - ZeroSearch: Incentivize the Search Capability of LLMs without Searching [69.55482019211597]
We introduce ZeroSearch, a framework that incentivizes the capabilities of large language models to use a real search engine with simulated searches during training.<n>Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents.
arXiv Detail & Related papers (2025-05-07T17:30:22Z) - Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [50.419872452397684]
Search-R1 is an extension of reinforcement learning for reasoning frameworks.<n>It generates search queries during step-by-step reasoning with real-time retrieval.<n>It improves performance by 41% (Qwen2.5-7B) and 20% (Qwen2.5-3B) over various RAG baselines.
arXiv Detail & Related papers (2025-03-12T16:26:39Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation [13.120930059424975]
C-3PO is a proxy-centric framework that facilitates communication between retrievers and large language models.<n>Our framework implements three specialized agents that collaboratively optimize the entire RAG pipeline.
arXiv Detail & Related papers (2025-02-10T07:04:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.