Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step
Reasoning
- URL: http://arxiv.org/abs/2401.10480v1
- Date: Fri, 19 Jan 2024 04:03:59 GMT
- Title: Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step
Reasoning
- Authors: Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin
Sun, Heda Wang, Kan Li
- Abstract summary: Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning.
We propose a simple and scalable sampling process, textbfEarly-Stopping textbfSelf-textbfConsistency (ESC) to greatly reduce the cost of SC without sacrificing performance.
- Score: 15.088675135566646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-consistency (SC) has been a widely used decoding strategy for
chain-of-thought reasoning. Despite bringing significant performance
improvements across a variety of multi-step reasoning tasks, it is a high-cost
method that requires multiple sampling with the preset size. In this paper, we
propose a simple and scalable sampling process, \textbf{E}arly-Stopping
\textbf{S}elf-\textbf{C}onsistency (ESC), to greatly reduce the cost of SC
without sacrificing performance. On this basis, one control scheme for ESC is
further derivated to dynamically choose the performance-cost balance for
different tasks and models. To demonstrate ESC's effectiveness, we conducted
extensive experiments on three popular categories of reasoning tasks:
arithmetic, commonsense and symbolic reasoning over language models with
varying scales. The empirical results show that ESC reduces the average number
of sampling of chain-of-thought reasoning by a significant margin on six
benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%),
CommonsenseQA (-78.5%), Coin Flip (-84.2%) and Last Letters (-67.4%), while
attaining comparable performances.
Related papers
- Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency (SC) results in significant computational costs proportional to the number of samples generated.
We propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that adjusts the number of sample generations.
RASC significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC.
arXiv Detail & Related papers (2024-08-30T05:14:59Z) - Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning [19.408941114068444]
Self-consistency (SC) is a widely used decoding strategy for chain-of-thought reasoning.
Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a set of pre-samples.
We propose Difficulty-Adaptive Self-Consistency (DSC), which leverages the difficulty information from both prior and posterior perspectives to adaptively allocate inference resources.
arXiv Detail & Related papers (2024-08-24T04:03:35Z) - Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? [13.222198659253056]
We introduce a new prompting framework (called SarcasmCue) containing four sub-methods.
It elicits large language models (LLMs) to detect human sarcasm by considering sequential and non-sequential prompting methods.
Our framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets.
arXiv Detail & Related papers (2024-07-17T16:42:03Z) - Prompt Perturbation Consistency Learning for Robust Language Models [47.021022978847036]
Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks.
We show that fine-tuning sufficiently large LLMs can produce IC-SF performance comparable to discriminative models.
We propose an efficient mitigation approach, Prompt Perturbation Consistency Learning (PPCL), which works by regularizing the divergence between losses from clean and perturbed samples.
arXiv Detail & Related papers (2024-02-24T15:00:58Z) - Soft Self-Consistency Improves Language Model Agents [57.66282463340297]
Current "sample and select" methods rely on majority voting to score answers.
Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods.
For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
arXiv Detail & Related papers (2024-02-20T18:22:38Z) - ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through
Regularized Self-Attention [48.697458429460184]
Two factors, information bottleneck sensitivity and inconsistency between different attention topologies, could affect the performance of the Sparse Transformer.
This paper proposes a well-designed model named ERNIE-Sparse.
It consists of two distinctive parts: (i) Hierarchical Sparse Transformer (HST) to sequentially unify local and global information, and (ii) Self-Attention Regularization (SAR) to minimize the distance for transformers with different attention topologies.
arXiv Detail & Related papers (2022-03-23T08:47:01Z) - Provable Stochastic Optimization for Global Contrastive Learning: Small
Batch Does Not Harm Performance [53.49803579981569]
We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point.
Existing methods such as SimCLR requires a large batch size in order to achieve a satisfactory result.
We propose a memory-efficient optimization algorithm for solving the Global Contrastive Learning of Representations, named SogCLR.
arXiv Detail & Related papers (2022-02-24T22:16:53Z) - Towards Simple and Accurate Human Pose Estimation with Stair Network [34.421529219040295]
We develop a small yet discrimicative model called STair Network, which can be stacked towards an accurate multi-stage pose estimation system.
To reduce computational cost, STair Network is composed of novel basic feature extraction blocks.
We demonstrate the effectiveness of the STair Network on two standard datasets.
arXiv Detail & Related papers (2022-02-18T10:37:13Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.