Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding
- URL: http://arxiv.org/abs/2002.02492v2
- Date: Fri, 2 Oct 2020 22:36:49 GMT
- Title: Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding
- Authors: Sean Welleck, Ilia Kulikov, Jaedeok Kim, Richard Yuanzhe Pang,
Kyunghyun Cho
- Abstract summary: We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
- Score: 67.54760086239514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite strong performance on a variety of tasks, neural sequence models
trained with maximum likelihood have been shown to exhibit issues such as
length bias and degenerate repetition. We study the related issue of receiving
infinite-length sequences from a recurrent language model when using common
decoding algorithms. To analyze this issue, we first define inconsistency of a
decoding algorithm, meaning that the algorithm can yield an infinite-length
sequence that has zero probability under the model. We prove that commonly used
incomplete decoding algorithms - greedy search, beam search, top-k sampling,
and nucleus sampling - are inconsistent, despite the fact that recurrent
language models are trained to produce sequences of finite length. Based on
these insights, we propose two remedies which address inconsistency: consistent
variants of top-k and nucleus sampling, and a self-terminating recurrent
language model. Empirical results show that inconsistency occurs in practice,
and that the proposed methods prevent inconsistency.
Related papers
- Self-Consistency of Large Language Models under Ambiguity [4.141513298907867]
This work presents an evaluation benchmark for self-consistency in cases of under-specification.
We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task.
We find that average consistency ranges from 67% to 82%, far higher than would be predicted if a model's consistency was random.
arXiv Detail & Related papers (2023-10-20T11:57:56Z) - Mitigating the Learning Bias towards Repetition by Self-Contrastive
Training for Open-Ended Generation [92.42032403795879]
We show that pretrained language models (LMs) such as GPT2 still tend to generate repetitive texts.
We attribute their overestimation of token-level repetition probabilities to the learning bias.
We find that LMs use longer-range dependencies to predict repetitive tokens than non-repetitive ones, which may be the cause of sentence-level repetition loops.
arXiv Detail & Related papers (2023-07-04T07:53:55Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Learning to Reason With Relational Abstractions [65.89553417442049]
We study how to build stronger reasoning capability in language models using the idea of relational abstractions.
We find that models that are supplied with such sequences as prompts can solve tasks with a significantly higher accuracy.
arXiv Detail & Related papers (2022-10-06T00:27:50Z) - A Non-monotonic Self-terminating Language Model [62.93465126911921]
In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm.
We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling.
We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
arXiv Detail & Related papers (2022-10-03T00:28:44Z) - Calibrating Sequence likelihood Improves Conditional Language Generation [39.35161650538767]
Conditional language models are predominantly trained with maximum likelihood estimation (MLE)
While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality.
We introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space.
arXiv Detail & Related papers (2022-09-30T19:16:16Z) - Uncertainty Determines the Adequacy of the Mode and the Tractability of
Decoding in Sequence-to-Sequence Models [11.258630552727432]
We analyze how ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models.
We show that well-known pathologies such as a high number of beam search errors, the inadequacy of the mode, and the drop in system performance with large beam sizes apply to tasks with high level of ambiguity.
arXiv Detail & Related papers (2022-04-01T14:30:19Z) - Diverse Counterfactual Explanations for Anomaly Detection in Time Series [26.88575131193757]
We propose a model-agnostic algorithm that generates counterfactual ensemble explanations for time series anomaly detection models.
Our method generates a set of diverse counterfactual examples, i.e., multiple versions of the original time series that are not considered anomalous by the detection model.
Our algorithm is applicable to any differentiable anomaly detection model.
arXiv Detail & Related papers (2022-03-21T16:30:34Z) - Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models.
In use-cases that call for multiple solutions, a diverse or representative set is often desired.
By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.