Universal Self-Consistency for Large Language Model Generation
- URL: http://arxiv.org/abs/2311.17311v1
- Date: Wed, 29 Nov 2023 02:07:09 GMT
- Title: Universal Self-Consistency for Large Language Model Generation
- Authors: Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng
Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou
- Abstract summary: Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on challenging tasks.
We propose Universal Self-Consistency (USC), which leverages large language models (LLMs) to select the most consistent answer.
- Score: 72.6761480346095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-consistency with chain-of-thought prompting (CoT) has demonstrated
remarkable performance gains on various challenging tasks, by utilizing
multiple reasoning paths sampled from large language models (LLMs). However,
self-consistency relies on the answer extraction process to aggregate multiple
solutions, which is not applicable to free-form answers. In this work, we
propose Universal Self-Consistency (USC), which leverages LLMs themselves to
select the most consistent answer among multiple candidates. We evaluate USC on
a variety of benchmarks, including mathematical reasoning, code generation,
long-context summarization, and open-ended question answering. On open-ended
generation tasks where the original self-consistency method is not applicable,
USC effectively utilizes multiple samples and improves the performance. For
mathematical reasoning, USC matches the standard self-consistency performance
without requiring the answer formats to be similar. Finally, without access to
execution results, USC also matches the execution-based voting performance on
code generation.
Related papers
- Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - Integrative Decoding: Improve Factuality via Implicit Self-consistency [45.27124252002816]
Self-consistency-based approaches are remarkably effective in improving the factual accuracy of large language models.
We present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks.
arXiv Detail & Related papers (2024-10-02T13:52:55Z) - To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Dynamic Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling [9.44858963874474]
Self-Consistency (SC) results in significant computational costs proportional to the number of samples generated.
We propose Reasoning-Aware Self-Consistency (RASC), an innovative early-stopping framework that adjusts the number of sample generations.
RASC significantly reduces sample usage by an average of 80% while maintaining or improving accuracy up to 5% compared to the original SC.
arXiv Detail & Related papers (2024-08-30T05:14:59Z) - Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation [20.138831477848615]
We propose Fine-Grained Self-Consistency (FSC) to optimize output quality by effectively fine-grained consensus knowledge from multiple samples.
The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning.
arXiv Detail & Related papers (2024-07-02T08:38:31Z) - Atomic Self-Consistency for Better Long Form Generations [12.753854064540636]
Atomic Self-Consistency (ASC) is a technique for improving the recall of relevant information in a long-form response.
ASC follows recent work, Universal Self-Consistency (USC) in using multiple samples to improve the long-form response.
Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample.
arXiv Detail & Related papers (2024-05-21T18:05:44Z) - Soft Self-Consistency Improves Language Model Agents [57.66282463340297]
Current "sample and select" methods rely on majority voting to score answers.
Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods.
For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
arXiv Detail & Related papers (2024-02-20T18:22:38Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - Universal Self-Adaptive Prompting [60.67460565566514]
Universal Self-Adaptive Prompting (USP) is an automatic prompt design approach specifically tailored for zero-shot learning.
USP is highly versatile: to achieve universal prompting, USP categorizes a possible NLP task into one of the three possible task types.
We evaluate USP with PaLM and PaLM 2 models and demonstrate performances that are considerably stronger than standard zero-shot baselines.
arXiv Detail & Related papers (2023-05-24T09:09:48Z) - Large Language Models are Better Reasoners with Self-Verification [48.534270563880845]
Large language models (LLMs) have shown strong reasoning ability in several natural language processing tasks.
LLMs with chain of thought (CoT) prompting require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes.
We propose and prove that LLMs also have similar self-verification abilities.
arXiv Detail & Related papers (2022-12-19T15:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.