Improving Diversity of Neural Text Generation via Inverse Probability
Weighting
- URL: http://arxiv.org/abs/2103.07649v1
- Date: Sat, 13 Mar 2021 08:17:40 GMT
- Title: Improving Diversity of Neural Text Generation via Inverse Probability
Weighting
- Authors: Xinran Zhang, Maosong Sun, Jiafeng Liu and Xiaobing Li
- Abstract summary: We propose a sampling method inspired by inverse probability weighting.
We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops.
Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
- Score: 43.36560720793425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The neural network based text generation suffers from the text degeneration
issue such as repetition. Although top-k sampling and nucleus sampling
outperform beam search based decoding methods, they only focus on truncating
the "tail" of the distribution and do not address the "head" part, which we
show might contain tedious or even repetitive candidates with high probability
that lead to repetition loops. They also do not fully address the issue that
human text does not always favor high probability words. To explore improved
diversity for text generation, we propose a heuristic sampling method inspired
by inverse probability weighting. We propose to use interquartile range of the
predicted distribution to determine the "head" part, then permutate and rescale
the "head" with inverse probability. This aims at decreasing the probability
for the tedious and possibly repetitive candidates with higher probability, and
increasing the probability for the rational but more surprising candidates with
lower probability. The proposed algorithm provides a controllable variation on
the predicted distribution which enhances diversity without compromising
rationality of the distribution. We use pre-trained language model to compare
our algorithm with nucleus sampling. Results show that our algorithm can
effectively increase the diversity of generated samples while achieving close
resemblance to human text.
Related papers
- Estimating the Probabilities of Rare Outputs in Language Models [8.585890569162267]
We study low probability estimation in the context of argmax sampling from small transformer language models.
We find that importance sampling outperforms activation extrapolation, but both outperform naive sampling.
We argue that new methods for low probability estimation are needed to provide stronger guarantees about worst-case performance.
arXiv Detail & Related papers (2024-10-17T04:31:18Z) - Closing the Curious Case of Neural Text Degeneration [91.22954750742183]
We provide a theoretical explanation for the effectiveness of the truncation sampling.
We show that we can leverage a known source of model errors, the softmax bottleneck, to prove that certain tokens have nonzero true probability.
Our evaluations show that our method outperforms its threshold-based counterparts for low-entropy text generation.
arXiv Detail & Related papers (2023-10-02T23:16:25Z) - On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z) - Truncation Sampling as Language Model Desmoothing [115.28983143361681]
Long samples of text from neural language models can be of poor quality.
Truncation sampling algorithms set some words' probabilities to zero at each step.
We introduce $eta$-sampling, which truncates words below an entropy-dependent probability threshold.
arXiv Detail & Related papers (2022-10-27T05:52:35Z) - Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an
Auxiliary Space [34.83587750498361]
Diverse human motion prediction aims at predicting multiple possible future pose sequences from a sequence of observed poses.
Previous approaches usually employ deep generative networks to model the conditional distribution of data, and then randomly sample outcomes from the distribution.
We propose a novel sampling strategy for sampling very diverse results from an imbalanced multimodal distribution.
arXiv Detail & Related papers (2022-07-15T09:03:57Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.