Contrastive Decoding: Open-ended Text Generation as Optimization
- URL: http://arxiv.org/abs/2210.15097v2
- Date: Mon, 10 Jul 2023 06:08:55 GMT
- Title: Contrastive Decoding: Open-ended Text Generation as Optimization
- Authors: Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner,
Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis
- Abstract summary: We propose contrastive decoding (CD), a reliable decoding approach.
It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs.
CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
- Score: 153.35961722855686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a language model (LM), maximum probability is a poor decoding objective
for open-ended generation, because it produces short and repetitive text. On
the other hand, sampling can often produce incoherent text that drifts from the
original topics. We propose contrastive decoding (CD), a reliable decoding
approach that optimizes a contrastive objective subject to a plausibility
constraint. The contrastive objective returns the difference between the
likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM
(called the amateur, e.g. OPT-125M), and the constraint ensures that the
outputs are plausible. CD is inspired by the fact that the failures of larger
LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and
that this difference signals which texts should be preferred. CD requires zero
additional training, and produces higher quality text than decoding from the
larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and
significantly outperforms four strong decoding algorithms (e.g., nucleus,
top-k) in automatic and human evaluations across wikipedia, news and story
domains.
Related papers
- Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM [93.8400683020273]
Contrastive decoding (CD) improves the next-token distribution of a large expert language model (LM) using a small amateur LM.
We propose a new unsupervised decoding method called $mathbfA$symptotic $mathbfP$robability $mathbfD$ecoding (APD)
APD explicitly extrapolates the probability curves from the LMs of different sizes to infer the probabilities from an infinitely large LM without inducing more inference costs than CD.
arXiv Detail & Related papers (2024-11-03T15:31:44Z) - $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation.
The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass.
We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z) - Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation [32.85339480783571]
We introduce a new decoding approach named Debiasing-Diversifying Decoding (D3)
D3 disables length normalization for ghost tokens to alleviate amplification bias.
Experiments on real-world datasets demonstrate the method's effectiveness.
arXiv Detail & Related papers (2024-06-21T06:47:28Z) - Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z) - Surfacing Biases in Large Language Models using Contrastive Input
Decoding [12.694066526722203]
Contrastive Input Decoding (CID) is a decoding algorithm to generate text given two inputs.
We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies.
arXiv Detail & Related papers (2023-05-12T11:09:49Z) - Stealing the Decoding Algorithms of Language Models [56.369946232765656]
A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms.
In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyper parameters of its decoding algorithms.
Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo.
arXiv Detail & Related papers (2023-03-08T17:15:58Z) - Factuality Enhanced Language Models for Open-Ended Text Generation [60.27166549575472]
We design the FactualityPrompts test set and metrics to measure the factuality of LM generations.
We find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions.
We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion.
arXiv Detail & Related papers (2022-06-09T17:16:43Z) - Is Your Language Model Ready for Dense Representation Fine-tuning? [15.238322226336232]
This paper shows that one cause lies in the readiness of the LM to expose its knowledge through dense representation in fine-tuning.
We present Condenser, a general pre-training architecture based on Transformer LMs, to improve dense optimization readiness.
arXiv Detail & Related papers (2021-04-16T17:36:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.