Related papers: Closing the Curious Case of Neural Text Degeneration

Closing the Curious Case of Neural Text Degeneration

URL: http://arxiv.org/abs/2310.01693v1
Date: Mon, 2 Oct 2023 23:16:25 GMT
Title: Closing the Curious Case of Neural Text Degeneration
Authors: Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish Sabharwal
Abstract summary: We provide a theoretical explanation for the effectiveness of the truncation sampling. We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability. Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
Score: 91.22954750742183
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation methods that discard tokens below some probability threshold (the most common type of truncation) can guarantee that all sampled tokens have nonzero true probability. However, thresholds are a coarse heuristic, and necessarily discard some tokens with nonzero true probability as well. In pursuit of a more precise sampling strategy, we show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability, without relying on a threshold. Based on our findings, we develop an experimental truncation strategy and the present pilot studies demonstrating the promise of this type of algorithm. Our evaluations show that our method outperforms its threshold-based counterparts under automatic and human evaluation metrics for low-entropy (i.e., close to greedy) open-ended text generation. Our theoretical findings and pilot experiments provide both insight into why truncation sampling works, and make progress toward more expressive sampling algorithms that better surface the generative capabilities of large language models.

Related papers

Language Models Can Predict Their Own Behavior [28.80639362933004]
We show that internal representation of input tokens alone can often precisely predict, not just the next token, but eventual behavior over the entire output sequence. We leverage this capacity and learn probes on internal states to create early warning (and exit) systems. Specifically, if the probes can confidently estimate the way the LM is going to behave, then the system will avoid generating tokens altogether and return the estimated behavior instead.
arXiv Detail & Related papers (2025-02-18T23:13:16Z)
Estimating the Probabilities of Rare Outputs in Language Models [8.585890569162267]
We study low probability estimation in the context of argmax sampling from small transformer language models. We find that importance sampling outperforms activation extrapolation, but both outperform naive sampling. We argue that new methods for low probability estimation are needed to provide stronger guarantees about worst-case performance.
arXiv Detail & Related papers (2024-10-17T04:31:18Z)
A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning. We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution. We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms set some words' probabilities to zero at each step. We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z)
Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive. We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
Improving Diversity of Neural Text Generation via Inverse Probability Weighting [43.36560720793425]
We propose a sampling method inspired by inverse probability weighting. We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops. Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
arXiv Detail & Related papers (2021-03-13T08:17:40Z)
Certifying Neural Network Robustness to Random Input Noise from Samples [14.191310794366075]
Methods to certify the robustness of neural networks in the presence of input uncertainty are vital in safety-critical settings. We propose a novel robustness certification method that upper bounds the probability of misclassification when the input noise follows an arbitrary probability distribution.
arXiv Detail & Related papers (2020-10-15T05:27:21Z)
BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images) We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples. Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.