Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step
Reasoning
- URL: http://arxiv.org/abs/2401.10480v1
- Date: Fri, 19 Jan 2024 04:03:59 GMT
- Title: Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step
Reasoning
- Authors: Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin
Sun, Heda Wang, Kan Li
- Abstract summary: Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning.
We propose a simple and scalable sampling process, textbfEarly-Stopping textbfSelf-textbfConsistency (ESC) to greatly reduce the cost of SC without sacrificing performance.
- Score: 15.088675135566646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-consistency (SC) has been a widely used decoding strategy for
chain-of-thought reasoning. Despite bringing significant performance
improvements across a variety of multi-step reasoning tasks, it is a high-cost
method that requires multiple sampling with the preset size. In this paper, we
propose a simple and scalable sampling process, \textbf{E}arly-Stopping
\textbf{S}elf-\textbf{C}onsistency (ESC), to greatly reduce the cost of SC
without sacrificing performance. On this basis, one control scheme for ESC is
further derivated to dynamically choose the performance-cost balance for
different tasks and models. To demonstrate ESC's effectiveness, we conducted
extensive experiments on three popular categories of reasoning tasks:
arithmetic, commonsense and symbolic reasoning over language models with
varying scales. The empirical results show that ESC reduces the average number
of sampling of chain-of-thought reasoning by a significant margin on six
benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%),
CommonsenseQA (-78.5%), Coin Flip (-84.2%) and Last Letters (-67.4%), while
attaining comparable performances.
Related papers
- Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models [56.37421741507468]
Chain-of-Thought (CoT) reasoning has significantly enhanced the performance of large language models (LLMs)
We propose a method to identify critical reasoning steps using perplexity as a measure of their importance.
arXiv Detail & Related papers (2025-02-18T20:04:51Z) - FastMCTS: A Simple Sampling Strategy for Data Synthesis [67.60823802317141]
We introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search.
FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals.
Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30% more correct reasoning paths.
arXiv Detail & Related papers (2025-02-17T06:27:57Z) - FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score.
We propose a principled metric to replace each pruned block using a weight-sharing mechanism.
Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z) - Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective [90.86370957353911]
Chain-of-Reasoning (CoR) is a novel unified framework that integrates multiple reasoning paradigms.
CoR generates multiple potential answers using different reasoning paradigms and synthesizes them into a coherent final solution.
Experimental results demonstrate that CoR-Math-7B significantly outperforms current SOTA models.
arXiv Detail & Related papers (2025-01-19T16:53:26Z) - Layer Pruning with Consensus: A Triple-Win Solution [0.0]
Layer-pruning approaches often rely on single criteria that may not fully capture the complex, underlying properties of layers.
We propose a novel approach that combines multiple similarity metrics into a single expressive measure of low-importance layers, called the Consensus criterion.
Our technique delivers a triple-win solution: low accuracy drop, high-performance improvement, and increased robustness to adversarial attacks.
arXiv Detail & Related papers (2024-11-21T17:41:27Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning [19.408941114068444]
Self-consistency (SC) is a widely used decoding strategy for chain-of-thought reasoning.
Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples.
We propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information of batch queries to adaptively allocate inference resources.
arXiv Detail & Related papers (2024-08-24T04:03:35Z) - Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? [13.222198659253056]
We introduce a new prompting framework (called SarcasmCue) containing four sub-methods.
It elicits large language models (LLMs) to detect human sarcasm by considering sequential and non-sequential prompting methods.
Our framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets.
arXiv Detail & Related papers (2024-07-17T16:42:03Z) - Soft Self-Consistency Improves Language Model Agents [57.66282463340297]
Current "sample and select" methods rely on majority voting to score answers.
Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods.
For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
arXiv Detail & Related papers (2024-02-20T18:22:38Z) - Towards Simple and Accurate Human Pose Estimation with Stair Network [34.421529219040295]
We develop a small yet discrimicative model called STair Network, which can be stacked towards an accurate multi-stage pose estimation system.
To reduce computational cost, STair Network is composed of novel basic feature extraction blocks.
We demonstrate the effectiveness of the STair Network on two standard datasets.
arXiv Detail & Related papers (2022-02-18T10:37:13Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.