Related papers: Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search

URL: http://arxiv.org/abs/2411.11694v1
Date: Mon, 18 Nov 2024 16:15:17 GMT
Title: Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search
Authors: Jinhao Jiang, Zhipeng Chen, Yingqian Min, Jie Chen, Xiaoxue Cheng, Jiapeng Wang, Yiru Tang, Haoxiang Sun, Jia Deng, Wayne Xin Zhao, Zheng Liu, Dong Yan, Jian Xie, Zhongyuan Wang, Ji-Rong Wen,
Abstract summary: o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
Score: 95.06503095273395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, test-time scaling has garnered significant attention from the research community, largely due to the substantial advancements of the o1 model released by OpenAI. By allocating more computational resources during the inference phase, large language models~(LLMs) can extensively explore the solution space by generating more thought tokens or diverse solutions, thereby producing more accurate responses. However, developing an o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. In this paper, we present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms. This framework is implemented by integrating the policy model, reward model, and search algorithm. It is primarily constructed around a tree search algorithm, where the policy model navigates a dynamically expanding tree guided by a specially trained reward model. We thoroughly explore various design considerations necessary for implementing this framework and provide a detailed report of the technical aspects. To assess the effectiveness of our approach, we focus on mathematical reasoning tasks and conduct extensive evaluations on four challenging datasets, significantly enhancing the reasoning abilities of LLMs.

Related papers

MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z)
Policy Guided Tree Search for Enhanced LLM Reasoning [3.090041654375235]
Policy-Guided Tree Search (PGTS) is a framework that combines reinforcement learning with structured tree exploration to efficiently navigate reasoning paths. Our key innovation is a learned policy that dynamically decides between expanding, branching, backtracking, or terminating exploration, eliminating the need for manuals or exhaustive search.
arXiv Detail & Related papers (2025-02-04T22:08:20Z)
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning [0.8192907805418583]
Chain-of-Associated-Thoughts (CoAT) framework introduces an innovative synergy between the Monte Carlo Tree Search (MCTS) algorithm and a dynamic mechanism for integrating new key information, termed 'associative memory'<n>By combining the structured exploration capabilities of MCTS with the adaptive learning capacity of associative memory, CoAT significantly expands the LLM search space, enabling our framework to explore diverse reasoning pathways and dynamically update its knowledge base in real-time.<n>These experiments demonstrated that our framework outperforms conventional inference processes on accuracy, coherence, and diversity.
arXiv Detail & Related papers (2025-02-04T15:10:33Z)
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models [33.13238566815798]
Large Language Models (LLMs) have sparked significant research interest in leveraging them to tackle complex reasoning tasks. Recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can significantly boost reasoning accuracy. The introduction of OpenAI's o1 series marks a significant milestone in this research direction.
arXiv Detail & Related papers (2025-01-16T17:37:58Z)
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective [77.94874338927492]
OpenAI has claimed that the main techinique behinds o1 is the reinforcement learning. This paper analyzes the roadmap to achieving o1 from the perspective of reinforcement learning.
arXiv Detail & Related papers (2024-12-18T18:24:47Z)
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving [11.596474985695679]
We release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process. We propose BPP-Search, a algorithm that integrates reinforcement learning into a tree-of-thought structure. BPP-Search significantly outperforms state-of-the-art methods, including Chain-of-Thought, Self-Consistency, and Tree-of-Thought.
arXiv Detail & Related papers (2024-11-26T13:05:53Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget. Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z)
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains. This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z)
Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM) All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models [17.059322033670124]
We propose a novel strategy that propels Large Language Models through algorithmic reasoning pathways. Our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself.
arXiv Detail & Related papers (2023-08-20T22:36:23Z)
Learning Optimal Tree Models Under Beam Search [27.92120639502327]
Existing tree models suffer from the training-testing discrepancy. We develop the concept of Bayes optimality under beam search and calibration under beam search. We propose a novel algorithm for learning optimal tree models under beam search.
arXiv Detail & Related papers (2020-06-27T17:20:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.