Enhancing LLM Reasoning with Reward-guided Tree Search
- URL: http://arxiv.org/abs/2411.11694v4
- Date: Tue, 31 Dec 2024 01:38:12 GMT
- Title: Enhancing LLM Reasoning with Reward-guided Tree Search
- Authors: Jinhao Jiang, Zhipeng Chen, Yingqian Min, Jie Chen, Xiaoxue Cheng, Jiapeng Wang, Yiru Tang, Haoxiang Sun, Jia Deng, Wayne Xin Zhao, Zheng Liu, Dong Yan, Jian Xie, Zhongyuan Wang, Ji-Rong Wen,
- Abstract summary: o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.
We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
- Score: 95.06503095273395
- License:
- Abstract: Recently, test-time scaling has garnered significant attention from the research community, largely due to the substantial advancements of the o1 model released by OpenAI. By allocating more computational resources during the inference phase, large language models~(LLMs) can extensively explore the solution space by generating more thought tokens or diverse solutions, thereby producing more accurate responses. However, developing an o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. In this paper, we present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms. This framework is implemented by integrating the policy model, reward model, and search algorithm. It is primarily constructed around a tree search algorithm, where the policy model navigates a dynamically expanding tree guided by a specially trained reward model. The implemented framework is denoted as \textbf{STILL-1}. We thoroughly explore various design considerations necessary for implementing this framework and provide a detailed report of the technical aspects. To assess the effectiveness of our approach, we focus on mathematical reasoning tasks and conduct extensive evaluations on four challenging datasets, significantly enhancing the reasoning abilities of LLMs.
Related papers
- Policy Guided Tree Search for Enhanced LLM Reasoning [3.090041654375235]
Policy-Guided Tree Search (PGTS) is a framework that combines reinforcement learning with structured tree exploration to efficiently navigate reasoning paths.
Our key innovation is a learned policy that dynamically decides between expanding, branching, backtracking, or terminating exploration, eliminating the need for manuals or exhaustive search.
arXiv Detail & Related papers (2025-02-04T22:08:20Z) - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models [33.13238566815798]
Large Language Models (LLMs) have sparked significant research interest in leveraging them to tackle complex reasoning tasks.
Recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can significantly boost reasoning accuracy.
The introduction of OpenAI's o1 series marks a significant milestone in this research direction.
arXiv Detail & Related papers (2025-01-16T17:37:58Z) - Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [77.94874338927492]
OpenAI has claimed that the main techinique behinds o1 is the reinforcement learning.
This paper analyzes the roadmap to achieving o1 from the perspective of reinforcement learning.
arXiv Detail & Related papers (2024-12-18T18:24:47Z) - BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving [11.596474985695679]
We release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process.
We propose BPP-Search, a algorithm that integrates reinforcement learning into a tree-of-thought structure.
In tree-based reasoning, BPP-Search excels in accuracy and efficiency, enabling faster retrieval of correct solutions.
arXiv Detail & Related papers (2024-11-26T13:05:53Z) - From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems.
The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness.
This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z) - LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - Learning Optimal Tree Models Under Beam Search [27.92120639502327]
Existing tree models suffer from the training-testing discrepancy.
We develop the concept of Bayes optimality under beam search and calibration under beam search.
We propose a novel algorithm for learning optimal tree models under beam search.
arXiv Detail & Related papers (2020-06-27T17:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.