When is the consistent prediction likely to be a correct prediction?
- URL: http://arxiv.org/abs/2407.05778v1
- Date: Mon, 8 Jul 2024 09:37:27 GMT
- Title: When is the consistent prediction likely to be a correct prediction?
- Authors: Alex Nguyen, Dheeraj Mekala, Chengyu Dong, Jingbo Shang,
- Abstract summary: We show that consistent answers derived through longer reasoning texts are more likely to be correct.
This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning.
We conclude that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.
- Score: 34.41365254799998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all outputs, are more likely to be correct. This is predominantly because we demonstrate that LLMs can autonomously produce chain-of-thought (CoT) style reasoning with no custom prompts merely while generating longer responses, which lead to consistent predictions that are more accurate. In the zero-shot setting, by sampling Mixtral-8x7B model multiple times and considering longer responses, we achieve 86% of its self-consistency performance obtained through zero-shot CoT prompting on the GSM8K and MultiArith datasets. Finally, we demonstrate that the probability of LLMs generating a longer response is quite low, highlighting the need for decoding strategies conditioned on output length.
Related papers
- Lachesis: Predicting LLM Inference Accuracy using Structural Properties of Reasoning Paths [12.377041655669728]
We introduce Lachesis, a predictive model for self-consistency based LLM inferences.
We empirically evaluate it using AutoFL, a recently proposed LLM-based fault localisation technique.
Results suggest that Lachesis can predict the correctness of answers with a precision of up to 0.8136.
arXiv Detail & Related papers (2024-12-11T10:56:47Z) - Iterative Reasoning Preference Optimization [84.15992372132507]
We develop an iterative approach to optimize the preference between generated Chain-of-Thought (CoT) candidates.
We show reasoning improves across repeated iterations of this scheme.
For example, we see a large improvement from 55.6% to 81.6% on GSM8K and an accuracy of 88.7% with majority voting out of 32 samples.
arXiv Detail & Related papers (2024-04-30T17:28:05Z) - Get an A in Math: Progressive Rectification Prompting [42.09762345892869]
Chain-of-Thought (CoT) prompting methods have enabled large language models (LLMs) to generate reasoning paths and solve math word problems (MWPs)
We propose a novel method named Progressive Rectification Prompting (PRP) to improve average accuracy on eight MWP datasets from 77.3 to 90.5.
arXiv Detail & Related papers (2023-12-11T22:25:57Z) - Training Chain-of-Thought via Latent-Variable Inference [30.21067593018967]
Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a chain-of-thought'' prompt.
Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers.
We propose a fine-tuning strategy that tries to maximize the emphmarginal log-likelihood of generating a correct answer using CoT prompting.
arXiv Detail & Related papers (2023-11-28T17:47:32Z) - Conformal Language Modeling [61.94417935386489]
We propose a novel approach to conformal prediction for generative language models (LMs)
Standard conformal prediction produces prediction sets with rigorous, statistical guarantees.
We demonstrate the promise of our approach on multiple tasks in open-domain question answering, text summarization, and radiology report generation.
arXiv Detail & Related papers (2023-06-16T21:55:08Z) - Language Models Don't Always Say What They Think: Unfaithful
Explanations in Chain-of-Thought Prompting [43.458726163197824]
Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output.
We find that CoT explanations can systematically misrepresent the true reason for a model's prediction.
arXiv Detail & Related papers (2023-05-07T22:44:25Z) - SCOTT: Self-Consistent Chain-of-Thought Distillation [68.40232422158569]
Large language models (LMs) generate free-text rationales for their predictions via chain-of-thought prompting.
We propose a faithful knowledge distillation method to learn a small, self-consistent CoT model from a teacher model that is orders of magnitude larger.
To ensure faithful distillation, we use the teacher-generated rationales to learn a student LM with a counterfactual reasoning objective.
arXiv Detail & Related papers (2023-05-03T03:47:00Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z) - Self-Consistency Improves Chain of Thought Reasoning in Language Models [53.45015291520658]
We explore a simple ensemble strategy, self-consistency, that significantly improves the reasoning accuracy of large language models.
For arithmetic and commonsense reasoning benchmarks we find that self-consistency yields significant accuracy improvements.
arXiv Detail & Related papers (2022-03-21T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.