ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
- URL: http://arxiv.org/abs/2503.19470v2
- Date: Thu, 27 Mar 2025 05:56:31 GMT
- Title: ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
- Authors: Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen, Fan Yang, Zenan Zhou, Weipeng Chen,
- Abstract summary: We propose ReSearch, a framework that trains LLMs to Reason with Search via reinforcement learning.<n>Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking.<n>Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction.
- Score: 37.183397387416065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. We propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. We train ReSearch on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct) models and conduct extensive experiments. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks. Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction during the reinforcement learning process.
Related papers
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [45.66983815273302]
This paper introduces Search-R1, an extension of the DeepSeek-R1 model to generate (multiple) search queries during step-by-step reasoning with real-time retrieval.<n> Experiments on seven question-answering datasets show that Search-R1 improves performance by 26% (Qwen2.5-7B), 21% (Qwen2.5-3B), and 10% (LLaMA3.2-3B) over strong baselines.
arXiv Detail & Related papers (2025-03-12T16:26:39Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - O1 Embedder: Let Retrievers Think Before Action [28.583031173137428]
We propose O1 Embedder, which generates useful thoughts for the input query before making retrieval for the target documents.
Our approach is evaluated by comprehensive experiments, where substantial improvements are achieved across 12 popular datasets.
These results highlight O1 Embedder's remarkable accuracy and generalizability, paving the way for the development of next-generation IR foundation models.
arXiv Detail & Related papers (2025-02-11T13:48:10Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.
Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.
We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Search-o1: Agentic Search-Enhanced Large Reasoning Models [24.239220558484373]
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning.<n>We introduce textbfSearch-o1, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module.
arXiv Detail & Related papers (2025-01-09T16:48:17Z) - RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement [85.08223786819532]
Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks.
We propose textbfRAG-Star, a novel RAG approach that integrates retrieved information to guide the tree-based deliberative reasoning process.
Our experiments involving Llama-3.1-8B-Instruct and GPT-4o demonstrate that RAG-Star significantly outperforms previous RAG and reasoning methods.
arXiv Detail & Related papers (2024-12-17T13:05:36Z) - Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.