Related papers: Improving Diversity of Neural Text Generation via Inverse Probability Weighting

Improving Diversity of Neural Text Generation via Inverse Probability Weighting

URL: http://arxiv.org/abs/2103.07649v1
Date: Sat, 13 Mar 2021 08:17:40 GMT
Title: Improving Diversity of Neural Text Generation via Inverse Probability Weighting
Authors: Xinran Zhang, Maosong Sun, Jiafeng Liu and Xiaobing Li
Abstract summary: We propose a sampling method inspired by inverse probability weighting. We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops. Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
Score: 43.36560720793425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The neural network based text generation suffers from the text degeneration issue such as repetition. Although top-k sampling and nucleus sampling outperform beam search based decoding methods, they only focus on truncating the "tail" of the distribution and do not address the "head" part, which we show might contain tedious or even repetitive candidates with high probability that lead to repetition loops. They also do not fully address the issue that human text does not always favor high probability words. To explore improved diversity for text generation, we propose a heuristic sampling method inspired by inverse probability weighting. We propose to use interquartile range of the predicted distribution to determine the "head" part, then permutate and rescale the "head" with inverse probability. This aims at decreasing the probability for the tedious and possibly repetitive candidates with higher probability, and increasing the probability for the rational but more surprising candidates with lower probability. The proposed algorithm provides a controllable variation on the predicted distribution which enhances diversity without compromising rationality of the distribution. We use pre-trained language model to compare our algorithm with nucleus sampling. Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.

Related papers

DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation [2.4555276449137042]
We propose a family of three new decoding methods by leveraging a mathematical analysis of the token probability distribution. Our approach consistently performs at least as well as current alternatives in terms of quality and diversity.
arXiv Detail & Related papers (2025-02-19T19:00:02Z)
Estimating the Probabilities of Rare Outputs in Language Models [8.585890569162267]
We study low probability estimation in the context of argmax sampling from small transformer language models. We find that importance sampling outperforms activation extrapolation, but both outperform naive sampling. We argue that new methods for low probability estimation are needed to provide stronger guarantees about worst-case performance.
arXiv Detail & Related papers (2024-10-17T04:31:18Z)
Closing the Curious Case of Neural Text Degeneration [91.22954750742183]
We provide a theoretical explanation for the effectiveness of the truncation sampling. We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability. Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
arXiv Detail & Related papers (2023-10-02T23:16:25Z)
On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters. We argue that the shift they enforce can be viewed as a trade-off between precision and recall. We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z)
Tailoring Language Generation Models under Total Variation Distance [55.89964205594829]
The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. We develop practical bounds to apply it to language generation. We introduce the TaiLr objective that balances the tradeoff of estimating TVD.
arXiv Detail & Related papers (2023-02-26T16:32:52Z)
Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms set some words' probabilities to zero at each step. We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z)
Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space [34.83587750498361]
Diverse human motion prediction aims at predicting multiple possible future pose sequences from a sequence of observed poses. Previous approaches usually employ deep generative networks to model the conditional distribution of data, and then randomly sample outcomes from the distribution. We propose a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution.
arXiv Detail & Related papers (2022-07-15T09:03:57Z)
On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens. We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z)
Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive. We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.