Related papers: Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

Consistency of a Recurrent Language Model With Respect to Incomplete Decoding

URL: http://arxiv.org/abs/2002.02492v2
Date: Fri, 2 Oct 2020 22:36:49 GMT
Title: Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Authors: Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang, Kyunghyun Cho
Abstract summary: We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
Score: 67.54760086239514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite strong performance on a variety of tasks, neural sequence models trained with maximum likelihood have been shown to exhibit issues such as length bias and degenerate repetition. We study the related issue of receiving infinite-length sequences from a recurrent language model when using common decoding algorithms. To analyze this issue, we first define inconsistency of a decoding algorithm, meaning that the algorithm can yield an infinite-length sequence that has zero probability under the model. We prove that commonly used incomplete decoding algorithms - greedy search, beam search, top-k sampling, and nucleus sampling - are inconsistent, despite the fact that recurrent language models are trained to produce sequences of finite length. Based on these insights, we propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model. Empirical results show that inconsistency occurs in practice, and that the proposed methods prevent inconsistency.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
We propose a family of three new decoding methods by leveraging a mathematical analysis of the token probability distribution. Our approach consistently performs at least as well as current alternatives in terms of quality and diversity.
arXiv Detail & Related papers (2025-02-19T19:00:02Z)
Self-Consistency of Large Language Models under Ambiguity [4.141513298907867]
This work presents an evaluation benchmark for self-consistency in cases of under-specification. We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task. We find that average consistency ranges from 67% to 82%, far higher than would be predicted if a model's consistency was random.
arXiv Detail & Related papers (2023-10-20T11:57:56Z)
Mitigating the Learning Bias towards Repetition by Self-Contrastive Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts. We attribute their overestimation of token-level repetition probabilities to the learning bias. We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z)
Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models. We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples. We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z)
Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions. We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z)
A Non-monotonic Self-terminating Language Model [62.93465126911921]
In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling. We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
arXiv Detail & Related papers (2022-10-03T00:28:44Z)
Calibrating Sequence likelihood Improves Conditional Language Generation [39.35161650538767]
Conditional language models are predominantly trained with maximum likelihood estimation (MLE) While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. We introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space.
arXiv Detail & Related papers (2022-09-30T19:16:16Z)
Uncertainty Determines the Adequacy of the Mode and the Tractability of Decoding in Sequence-to-Sequence Models [11.258630552727432]
We analyze how ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models. We show that well-known pathologies such as a high number of beam search errors, the inadequacy of the mode, and the drop in system performance with large beam sizes apply to tasks with high level of ambiguity.
arXiv Detail & Related papers (2022-04-01T14:30:19Z)
Diverse Counterfactual Explanations for Anomaly Detection in Time Series [26.88575131193757]
We propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models. Our method generates a set of diverse counterfactual examples, i.e., multiple versions of the original time series that are not considered anomalous by the detection model. Our algorithm is applicable to any differentiable anomaly detection model.
arXiv Detail & Related papers (2022-03-21T16:30:34Z)
Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models. In use-cases that call for multiple solutions, a diverse or representative set is often desired. By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.