Related papers: Contrastive Decoding: Open-ended Text Generation as Optimization

Contrastive Decoding: Open-ended Text Generation as Optimization

URL: http://arxiv.org/abs/2210.15097v2
Date: Mon, 10 Jul 2023 06:08:55 GMT
Title: Contrastive Decoding: Open-ended Text Generation as Optimization
Authors: Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis
Abstract summary: We propose contrastive decoding (CD), a reliable decoding approach. It is inspired by the fact that the failures of larger LMs are even more prevalent in smaller LMs. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone.
Score: 153.35961722855686
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Given a language model (LM), maximum probability is a poor decoding objective for open-ended generation, because it produces short and repetitive text. On the other hand, sampling can often produce incoherent text that drifts from the original topics. We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint. The contrastive objective returns the difference between the likelihood under a large LM (called the expert, e.g. OPT-13B) and a small LM (called the amateur, e.g. OPT-125M), and the constraint ensures that the outputs are plausible. CD is inspired by the fact that the failures of larger LMs (e.g., repetition, incoherence) are even more prevalent in smaller LMs, and that this difference signals which texts should be preferred. CD requires zero additional training, and produces higher quality text than decoding from the larger LM alone. It also works across model scales (OPT-13B and GPT2-1.5B) and significantly outperforms four strong decoding algorithms (e.g., nucleus, top-k) in automatic and human evaluations across wikipedia, news and story domains.

Related papers

Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM [93.8400683020273]
Contrastive decoding (CD) improves the next-token distribution of a large expert language model (LM) using a small amateur LM. We propose a new unsupervised decoding method called $mathbfA$symptotic $mathbfP$robability $mathbfD$ecoding (APD) APD explicitly extrapolates the probability curves from the LMs of different sizes to infer the probabilities from an infinitely large LM without inducing more inference costs than CD.
arXiv Detail & Related papers (2024-11-03T15:31:44Z)
$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation. The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass. We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z)
Decoding Matters: Addressing Amplification Bias and Homogeneity Issue for LLM-based Recommendation [32.85339480783571]
We introduce a new decoding approach named Debiasing-Diversifying Decoding (D3) D3 disables length normalization for ghost tokens to alleviate amplification bias. Experiments on real-world datasets demonstrate the method's effectiveness.
arXiv Detail & Related papers (2024-06-21T06:47:28Z)
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs [57.27982780697922]
Large language models have demonstrated exceptional capability in natural language understanding and generation. However, their generation speed is limited by the inherently sequential nature of their decoding process. This paper introduces Lexical Unit Decoding, a novel decoding methodology implemented in a data-driven manner.
arXiv Detail & Related papers (2024-05-24T04:35:13Z)
Contrastive Decoding Improves Reasoning in Large Language Models [55.16503283583076]
We show that Contrastive Decoding achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark.
arXiv Detail & Related papers (2023-09-17T00:29:32Z)
Surfacing Biases in Large Language Models using Contrastive Input Decoding [12.694066526722203]
Contrastive Input Decoding (CID) is a decoding algorithm to generate text given two inputs. We use CID to highlight context-specific biases that are hard to detect with standard decoding strategies.
arXiv Detail & Related papers (2023-05-12T11:09:49Z)
Stealing the Decoding Algorithms of Language Models [56.369946232765656]
A key component of generating text from modern language models (LM) is the selection and tuning of decoding algorithms. In this work, we show, for the first time, that an adversary with typical API access to an LM can steal the type and hyper parameters of its decoding algorithms. Our attack is effective against popular LMs used in text generation APIs, including GPT-2, GPT-3 and GPT-Neo.
arXiv Detail & Related papers (2023-03-08T17:15:58Z)
Factuality Enhanced Language Models for Open-Ended Text Generation [60.27166549575472]
We design the FactualityPrompts test set and metrics to measure the factuality of LM generations. We find that larger LMs are more factual than smaller ones, although a previous study suggests that larger LMs can be less truthful in terms of misconceptions. We propose a factuality-enhanced training method that uses TopicPrefix for better awareness of facts and sentence completion.
arXiv Detail & Related papers (2022-06-09T17:16:43Z)
Is Your Language Model Ready for Dense Representation Fine-tuning? [15.238322226336232]
This paper shows that one cause lies in the readiness of the LM to expose its knowledge through dense representation in fine-tuning. We present Condenser, a general pre-training architecture based on Transformer LMs, to improve dense optimization readiness.
arXiv Detail & Related papers (2021-04-16T17:36:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.