Universal Self-Consistency for Large Language Model Generation
- URL: http://arxiv.org/abs/2311.17311v1
- Date: Wed, 29 Nov 2023 02:07:09 GMT
- Title: Universal Self-Consistency for Large Language Model Generation
- Authors: Xinyun Chen, Renat Aksitov, Uri Alon, Jie Ren, Kefan Xiao, Pengcheng
Yin, Sushant Prakash, Charles Sutton, Xuezhi Wang, Denny Zhou
- Abstract summary: Self-consistency with chain-of-thought prompting (CoT) has demonstrated remarkable performance gains on challenging tasks.
We propose Universal Self-Consistency (USC), which leverages large language models (LLMs) to select the most consistent answer.
- Score: 72.6761480346095
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-consistency with chain-of-thought prompting (CoT) has demonstrated
remarkable performance gains on various challenging tasks, by utilizing
multiple reasoning paths sampled from large language models (LLMs). However,
self-consistency relies on the answer extraction process to aggregate multiple
solutions, which is not applicable to free-form answers. In this work, we
propose Universal Self-Consistency (USC), which leverages LLMs themselves to
select the most consistent answer among multiple candidates. We evaluate USC on
a variety of benchmarks, including mathematical reasoning, code generation,
long-context summarization, and open-ended question answering. On open-ended
generation tasks where the original self-consistency method is not applicable,
USC effectively utilizes multiple samples and improves the performance. For
mathematical reasoning, USC matches the standard self-consistency performance
without requiring the answer formats to be similar. Finally, without access to
execution results, USC also matches the execution-based voting performance on
code generation.
Related papers
- Bag of Tricks for Inference-time Computation of LLM Reasoning [10.366475014241407]
We investigate and benchmark diverse inference-time computation strategies across reasoning tasks of varying complexity.
Our ablation studies reveal that previously overlooked strategies can significantly enhance performance.
We establish a standardized benchmark for inference-time computation by systematically evaluating six representative methods across eight reasoning tasks.
arXiv Detail & Related papers (2025-02-11T02:31:11Z) - Revisit Self-Debugging with Self-Generated Tests for Code Generation [18.643472696246686]
Self-ging with self-generated tests is a promising solution but lacks a full exploration of its limitations and practical potential.
We propose two paradigms for the process: post-execution and in-execution self-ging.
We find that post-execution self-ging struggles on basic problems but shows potential for improvement on competitive ones, due to the bias introduced by self-generated tests.
arXiv Detail & Related papers (2025-01-22T10:54:19Z) - Self-Calibrated Listwise Reranking with Large Language Models [137.6557607279876]
Large language models (LLMs) have been employed in reranking tasks through a sequence-to-sequence approach.
This reranking paradigm requires a sliding window strategy to iteratively handle larger candidate sets.
We propose a novel self-calibrated listwise reranking method, which aims to leverage LLMs to produce global relevance scores for ranking.
arXiv Detail & Related papers (2024-11-07T10:31:31Z) - Integrative Decoding: Improve Factuality via Implicit Self-consistency [45.27124252002816]
Self-consistency-based approaches are remarkably effective in improving the factual accuracy of large language models.
We present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks.
arXiv Detail & Related papers (2024-10-02T13:52:55Z) - To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Integrate the Essence and Eliminate the Dross: Fine-Grained Self-Consistency for Free-Form Language Generation [20.138831477848615]
We propose Fine-Grained Self-Consistency (FSC) to optimize output quality by effectively fine-grained consensus knowledge from multiple samples.
The effectiveness of FSC is demonstrated through extensive experiments on various tasks, including summarization, code generation, and mathematical reasoning.
arXiv Detail & Related papers (2024-07-02T08:38:31Z) - Atomic Self-Consistency for Better Long Form Generations [12.753854064540636]
Atomic Self-Consistency (ASC) is a technique for improving the recall of relevant information in a long-form response.
ASC follows recent work, Universal Self-Consistency (USC) in using multiple samples to improve the long-form response.
Through extensive experiments and ablations, we show that merging relevant subparts of multiple samples performs significantly better than picking a single sample.
arXiv Detail & Related papers (2024-05-21T18:05:44Z) - Soft Self-Consistency Improves Language Model Agents [57.66282463340297]
Current "sample and select" methods rely on majority voting to score answers.
Soft Self-Consistency (SOFT-SC) replaces SC's discontinuous scoring with a continuous score computed from model likelihoods.
For a fixed number of samples, SOFT-SC leads to a 1.3% increase over SC in absolute success rate on writing bash programs, a 6.6% increase on online shopping, and a 4.7% increase for an interactive household game.
arXiv Detail & Related papers (2024-02-20T18:22:38Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - Universal Self-Adaptive Prompting [60.67460565566514]
Universal Self-Adaptive Prompting (USP) is an automatic prompt design approach specifically tailored for zero-shot learning.
USP is highly versatile: to achieve universal prompting, USP categorizes a possible NLP task into one of the three possible task types.
We evaluate USP with PaLM and PaLM 2 models and demonstrate performances that are considerably stronger than standard zero-shot baselines.
arXiv Detail & Related papers (2023-05-24T09:09:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.