A Non-monotonic Self-terminating Language Model
- URL: http://arxiv.org/abs/2210.00660v1
- Date: Mon, 3 Oct 2022 00:28:44 GMT
- Title: A Non-monotonic Self-terminating Language Model
- Authors: Eugene Choi, Cheolhyoung Lee, Kyunghyun Cho
- Abstract summary: In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm.
We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling.
We then propose a non-monotonic self-terminating language model, which relaxes the constraint of monotonically increasing termination probability.
- Score: 62.93465126911921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent large-scale neural autoregressive sequence models have shown
impressive performances on a variety of natural language generation tasks.
However, their generated sequences often exhibit degenerate properties such as
non-termination, undesirable repetition, and premature termination, when
generated with decoding algorithms such as greedy search, beam search, top-$k$
sampling, and nucleus sampling. In this paper, we focus on the problem of
non-terminating sequences resulting from an incomplete decoding algorithm. We
first define an incomplete probable decoding algorithm which includes greedy
search, top-$k$ sampling, and nucleus sampling, beyond the incomplete decoding
algorithm originally put forward by Welleck et al. (2020). We then propose a
non-monotonic self-terminating language model, which significantly relaxes the
constraint of monotonically increasing termination probability in the
originally proposed self-terminating language model by Welleck et al. (2020),
to address the issue of non-terminating sequences when using incomplete
probable decoding algorithms. We prove that our proposed model prevents
non-terminating sequences when using not only incomplete probable decoding
algorithms but also beam search. We empirically validate our model on sequence
completion tasks with various architectures.
Related papers
- GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Mutual Exclusivity Training and Primitive Augmentation to Induce
Compositionality [84.94877848357896]
Recent datasets expose the lack of the systematic generalization ability in standard sequence-to-sequence models.
We analyze this behavior of seq2seq models and identify two contributing factors: a lack of mutual exclusivity bias and the tendency to memorize whole examples.
We show substantial empirical improvements using standard sequence-to-sequence models on two widely-used compositionality datasets.
arXiv Detail & Related papers (2022-11-28T17:36:41Z) - Calibrating Sequence likelihood Improves Conditional Language Generation [39.35161650538767]
Conditional language models are predominantly trained with maximum likelihood estimation (MLE)
While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality.
We introduce sequence likelihood calibration (SLiC) where the likelihood of model generated sequences are calibrated to better align with reference sequences in the model's latent space.
arXiv Detail & Related papers (2022-09-30T19:16:16Z) - Uncertainty Determines the Adequacy of the Mode and the Tractability of
Decoding in Sequence-to-Sequence Models [11.258630552727432]
We analyze how ambiguity (also known as intrinsic uncertainty) shapes the distribution learned by neural sequence models.
We show that well-known pathologies such as a high number of beam search errors, the inadequacy of the mode, and the drop in system performance with large beam sizes apply to tasks with high level of ambiguity.
arXiv Detail & Related papers (2022-04-01T14:30:19Z) - Infinite-Dimensional Sparse Learning in Linear System Identification [0.2867517731896504]
This paper proposes an infinite-dimensional sparse learning algorithm based on atomic norm regularization.
The difficulty in solving the problem lies in the fact that there are an infinite number of possible atomic models.
arXiv Detail & Related papers (2022-03-28T13:18:48Z) - Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models.
In use-cases that call for multiple solutions, a diverse or representative set is often desired.
By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z) - Consistency of a Recurrent Language Model With Respect to Incomplete
Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model.
We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.