Self-Consistency Improves Chain of Thought Reasoning in Language Models
- URL: http://arxiv.org/abs/2203.11171v1
- Date: Mon, 21 Mar 2022 17:48:52 GMT
- Title: Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Authors: Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Denny Zhou
- Abstract summary: We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models.
For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
- Score: 53.45015291520658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore a simple ensemble strategy, self-consistency, that significantly
improves the reasoning accuracy of large language models. The idea is to sample
a diverse set of outputs from a language model and return the most consistent
answer in the set. Such ensembling method improves reasoning accuracy when
combined with chain of thought prompting. For arithmetic and commonsense
reasoning benchmarks we find that self-consistency yields significant accuracy
improvements in a variety of datasets, such as GSM8K (+10%), SVAMP (+14%),
MultiArith (+24%), CommonsenseQA (+5%) and ARC (easy +4%, challenge +5%).
Related papers
- Integrative Decoding: Improve Factuality via Implicit Self-consistency [45.27124252002816]
Self-consistency-based approaches are remarkably effective in improving the factual accuracy of large language models.
We present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks.
arXiv Detail & Related papers (2024-10-02T13:52:55Z) - Large Language Models are Contrastive Reasoners [8.427805316635318]
We show how contrastive prompting significantly improves the ability of large language models to perform complex reasoning.
Experiments on various large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.
Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods.
arXiv Detail & Related papers (2024-03-13T03:15:05Z) - Automatic Model Selection with Large Language Models for Reasoning [33.93807127935167]
Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods.
We introduce a model selection method to combine the best of both worlds by employing a large language model.
Our proposed method demonstrates significant performance improvements across eight reasoning datasets.
arXiv Detail & Related papers (2023-05-23T17:57:59Z) - Progressive-Hint Prompting Improves Reasoning in Large Language Models [63.98629132836499]
This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP)
It enables automatic multiple interactions between users and Large Language Models (LLMs) by using previously generated answers as hints to progressively guide toward the correct answers.
We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient.
arXiv Detail & Related papers (2023-04-19T16:29:48Z) - Faithful Chain-of-Thought Reasoning [51.21714389639417]
Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of reasoning tasks.
We propose Faithful CoT, a reasoning framework involving two stages: Translation and Problem Solving.
This guarantees that the reasoning chain provides a faithful explanation of the final answer.
arXiv Detail & Related papers (2023-01-31T03:04:26Z) - Making Large Language Models Better Reasoners with Step-Aware Verifier [49.16750018427259]
DIVERSE (Diverse Verifier on Reasoning Step) is a novel approach that further enhances the reasoning capability of language models.
We evaluate DIVERSE on the latest language model code-davinci and show that it achieves new state-of-the-art results on six of eight reasoning benchmarks.
arXiv Detail & Related papers (2022-06-06T03:38:36Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.