Universal Self-Consistency for Large Language Model Generation
- URL: http://arxiv.org/abs/2311.17311v1
- Date: Wed, 29 Nov 2023 02:07:09 GMT
- Title: Universal Self-Consistency for Large Language Model Generation
- Authors: Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng
Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou
- Abstract summary: Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on challenging tasks.
We propose Universal Self-Consistency (USC), which leverages large language models (LLMs) to select the most consistent answer.
- Score: 72.6761480346095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-consistency with chain-of-thought prompting (CoT) has demonstrated
remarkable performance gains on various challenging tasks, by utilizing
multiple reasoning paths sampled from large language models (LLMs). However,
self-consistency relies on the answer extraction process to aggregate multiple
solutions, which is not applicable to free-form answers. In this work, we
propose Universal Self-Consistency (USC), which leverages LLMs themselves to
select the most consistent answer among multiple candidates. We evaluate USC on
a variety of benchmarks, including mathematical reasoning, code generation,
long-context summarization, and open-ended question answering. On open-ended
generation tasks where the original self-consistency method is not applicable,
USC effectively utilizes multiple samples and improves the performance. For
mathematical reasoning, USC matches the standard self-consistency performance
without requiring the answer formats to be similar. Finally, without access to
execution results, USC also matches the execution-based voting performance on
code generation.
Related papers
- Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation [20.138831477848615]
We propose Fine-Grained Self-Consistency (FSC) to optimize output quality by effectively fine-grained consensus knowledge from multiple samples.
The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning.
arXiv Detail & Related papers (2024-07-02T08:38:31Z) - Nash CoT: Multi-Path Inference with Preference Equilibrium [40.50811042423615]
Chain-of-thought (CoT) prompting has emerged as a powerful technique for enhancing the reasoning capabilities of Large Language Models (LLMs)
We conceptualize Symbolic language decoding as a preference consensus game, constructing a bi-player gaming system within each local path, and introduce Nash Chain-of-Thought (Nash CoT)
We achieve comparable or improved performance compared to self-consistency while using fewer inference paths on various inference tasks, including Arabic reasoning, Commonsense Question answering, and inference.
arXiv Detail & Related papers (2024-06-18T07:46:13Z) - Atomic Self-Consistency for Better Long Form Generations [12.753854064540636]
Atomic Self-Consistency (ASC) is a technique for improving the recall of relevant information in a long-form response.
ASC follows recent work, Universal Self-Consistency (USC) in using multiple samples to improve the long-form response.
Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample.
arXiv Detail & Related papers (2024-05-21T18:05:44Z) - Soft Self-Consistency Improves Language Model Agents [57.66282463340297]
Current "sample and select" methods rely on majority voting to score answers.
Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods.
For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
arXiv Detail & Related papers (2024-02-20T18:22:38Z) - DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs [9.561022942046279]
We propose Divide and Conquer Reasoning (DCR) to enhance the reasoning capability of large language models (LLMs)
We first categorize questions into two subsets based on confidence score ($mathcalCS$), which is estimated by statistical frequency of generated answers.
In particular, we first categorize questions into two subsets based on confidence score ($mathcalCS$), which is estimated by statistical frequency of generated answers.
arXiv Detail & Related papers (2024-01-10T14:38:46Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - Universal Self-Adaptive Prompting [60.67460565566514]
Universal Self-Adaptive Prompting (USP) is an automatic prompt design approach specifically tailored for zero-shot learning.
USP is highly versatile: to achieve universal prompting, USP categorizes a possible NLP task into one of the three possible task types.
We evaluate USP with PaLM and PaLM 2 models and demonstrate performances that are considerably stronger than standard zero-shot baselines.
arXiv Detail & Related papers (2023-05-24T09:09:48Z) - RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning.
We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z) - Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning
and Coding with LLMs [60.58434523646137]
A popular approach for improving the correctness of output from large language models (LLMs) is Self-Consistency.
We introduce Adaptive-Consistency, a cost-efficient, model-agnostic technique that dynamically adjusts the number of samples per question.
Our experiments show that Adaptive-Consistency reduces sample budget by up to 7.9 times with an average accuracy drop of less than 0.1%.
arXiv Detail & Related papers (2023-05-19T17:49:25Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.