Closing the Curious Case of Neural Text Degeneration
- URL: http://arxiv.org/abs/2310.01693v1
- Date: Mon, 2 Oct 2023 23:16:25 GMT
- Title: Closing the Curious Case of Neural Text Degeneration
- Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta,
Ashish Sabharwal
- Abstract summary: We provide a theoretical explanation for the effectiveness of the truncation sampling.
We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability.
Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
- Score: 91.22954750742183
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their ubiquity in language generation, it remains unknown why
truncation sampling heuristics like nucleus sampling are so effective. We
provide a theoretical explanation for the effectiveness of the truncation
sampling by proving that truncation methods that discard tokens below some
probability threshold (the most common type of truncation) can guarantee that
all sampled tokens have nonzero true probability. However, thresholds are a
coarse heuristic, and necessarily discard some tokens with nonzero true
probability as well. In pursuit of a more precise sampling strategy, we show
that we can leverage a known source of model errors, the softmax bottleneck, to
prove that certain tokens have nonzero true probability, without relying on a
threshold. Based on our findings, we develop an experimental truncation
strategy and the present pilot studies demonstrating the promise of this type
of algorithm. Our evaluations show that our method outperforms its
threshold-based counterparts under automatic and human evaluation metrics for
low-entropy (i.e., close to greedy) open-ended text generation. Our theoretical
findings and pilot experiments provide both insight into why truncation
sampling works, and make progress toward more expressive sampling algorithms
that better surface the generative capabilities of large language models.
Related papers
- Estimating the Probabilities of Rare Outputs in Language Models [8.585890569162267]
We study low probability estimation in the context of argmax sampling from small transformer language models.
We find that importance sampling outperforms activation extrapolation, but both outperform naive sampling.
We argue that new methods for low probability estimation are needed to provide stronger guarantees about worst-case performance.
arXiv Detail & Related papers (2024-10-17T04:31:18Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method.
We develop practical bounds to apply it to language generation.
We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z) - Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality.
Truncation sampling algorithms set some words' probabilities to zero at each step.
We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Improving Diversity of Neural Text Generation via Inverse Probability
Weighting [43.36560720793425]
We propose a sampling method inspired by inverse probability weighting.
We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops.
Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
arXiv Detail & Related papers (2021-03-13T08:17:40Z) - Certifying Neural Network Robustness to Random Input Noise from Samples [14.191310794366075]
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings.
We propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution.
arXiv Detail & Related papers (2020-10-15T05:27:21Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.