Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
- URL: http://arxiv.org/abs/2410.01920v3
- Date: Wed, 9 Oct 2024 20:20:47 GMT
- Title: Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
- Authors: Shengyu Feng, Xiang Kong, Shuang Ma, Aonan Zhang, Dong Yin, Chong Wang, Ruoming Pang, Yiming Yang,
- Abstract summary: We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC)
We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions.
This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
- Score: 55.452453947359736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.
Related papers
- Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling [38.7578639980701]
Self-improvement methods enable large language models to generate solutions themselves.
We find that models tend to over-sample on easy queries and under-sample on queries they have yet to master.
We introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data.
arXiv Detail & Related papers (2024-11-01T17:18:45Z) - Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z) - Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning.
LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors.
We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z) - On Speeding Up Language Model Evaluation [48.51924035873411]
Development of prompt-based methods with Large Language Models (LLMs) requires making numerous decisions.
We propose a novel method to address this challenge.
We show that it can identify the top-performing method using only 5-15% of the typically needed resources.
arXiv Detail & Related papers (2024-07-08T17:48:42Z) - Improve Mathematical Reasoning in Language Models by Automated Process Supervision [22.72856086318912]
We propose a novel Monte Carlo Tree Search (MCTS) algorithm named textitOmegaPRM for the efficient collection of high-quality process supervision data.
We are able to collect over 1.5 million process supervision annotations to train a Process Reward Model (PRM)
We have enhanced the instruction tuned Gemini Pro model's math reasoning performance, achieving a 69.4% success rate on the MATH benchmark.
arXiv Detail & Related papers (2024-06-05T19:25:40Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Incremental Verification of Fixed-Point Implementations of Neural
Networks [0.19573380763700707]
We develop and evaluate a novel symbolic verification framework using incremental bounded model checking (BMC), satisfiability modulo theories (SMT), and invariant inference.
Our approach was able to verify and produce examples for 85.8% of 21 test cases considering different input images, and 100% of the properties related to covering methods.
arXiv Detail & Related papers (2020-12-21T10:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.