Self-Evaluation Guided Beam Search for Reasoning
- URL: http://arxiv.org/abs/2305.00633v3
- Date: Thu, 26 Oct 2023 01:43:17 GMT
- Title: Self-Evaluation Guided Beam Search for Reasoning
- Authors: Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian
He, Qizhe Xie
- Abstract summary: We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
- Score: 61.523627290397556
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Breaking down a problem into intermediate steps has demonstrated impressive
performance in Large Language Model (LLM) reasoning. However, the growth of the
reasoning chain introduces uncertainty and error accumulation, making it
challenging to elicit accurate final results. To tackle this challenge of
uncertainty in multi-step reasoning, we introduce a stepwise self-evaluation
mechanism to guide and calibrate the reasoning process of LLMs. We propose a
decoding algorithm integrating the self-evaluation guidance via stochastic beam
search. The self-evaluation guidance serves as a better-calibrated automatic
criterion, facilitating an efficient search in the reasoning space and
resulting in superior prediction quality. Stochastic beam search balances
exploitation and exploration of the search space with temperature-controlled
randomness. Our approach surpasses the corresponding Codex-backboned baselines
in few-shot accuracy by $6.34\%$, $9.56\%$, and $5.46\%$ on the GSM8K, AQuA,
and StrategyQA benchmarks, respectively. Experiment results with Llama-2 on
arithmetic reasoning demonstrate the efficiency of our method in outperforming
the baseline methods with comparable computational budgets. Further analysis in
multi-step reasoning finds our self-evaluation guidance pinpoints logic
failures and leads to higher consistency and robustness. Our code is publicly
available at https://guideddecoding.github.io/.
Related papers
- Scaling Test-Time Compute Without Verification or RL is Suboptimal [70.28430200655919]
We show that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed amount of compute/data budget.
We corroborate our theory empirically on both didactic and math reasoning problems with 3/8B-sized pre-trained LLMs, where we find verification is crucial for scaling test-time compute.
arXiv Detail & Related papers (2025-02-17T18:43:24Z) - EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta [2.1249213103048414]
We introduce the EQUATOR Evaluator, which combines deterministic scoring with a focus on factual accuracy and robust reasoning assessment.
Using a vector database, EQUATOR pairs open-ended questions with human-evaluated answers, enabling more precise and scalable evaluations.
Our results demonstrate that this framework significantly outperforms traditional multiple-choice evaluations while maintaining high accuracy standards.
arXiv Detail & Related papers (2024-12-31T03:56:17Z) - FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models [21.96773736059112]
Language Language Models (LLMs) face safety concerns due to potential misuse by malicious users.
Recent red-teaming efforts have identified adversarial suffixes capable of jailbreaking LLMs using the gradient-based search algorithm Greedy Coordinate Gradient (GCG)
We propose a two-stage transfer learning framework, DeGCG, which decouples the search process into behavior-agnostic pre-searching and behavior-relevant post-searching.
arXiv Detail & Related papers (2024-08-27T08:38:48Z) - Sublinear Regret for a Class of Continuous-Time Linear--Quadratic Reinforcement Learning Problems [10.404992912881601]
We study reinforcement learning for a class of continuous-time linear-quadratic (LQ) control problems for diffusions.
We apply a model-free approach that relies neither on knowledge of model parameters nor on their estimations, and devise an actor-critic algorithm to learn the optimal policy parameter directly.
arXiv Detail & Related papers (2024-07-24T12:26:21Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - A Strong Baseline for Batch Imitation Learning [25.392006064406967]
We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm.
This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern.
arXiv Detail & Related papers (2023-02-06T14:03:33Z) - Reliable Causal Discovery with Improved Exact Search and Weaker
Assumptions [17.097192646470372]
We introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting.
We develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness.
We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure.
arXiv Detail & Related papers (2022-01-14T20:52:30Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.