On the probability-quality paradox in language generation
- URL: http://arxiv.org/abs/2203.17217v1
- Date: Thu, 31 Mar 2022 17:43:53 GMT
- Title: On the probability-quality paradox in language generation
- Authors: Clara Meister and Gian Wiher and Tiago Pimentel and Ryan Cotterell
- Abstract summary: We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
- Score: 76.69397802617064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When generating natural language from neural probabilistic models, high
probability does not always coincide with high quality: It has often been
observed that mode-seeking decoding methods, i.e., those that produce
high-probability text under the model, lead to unnatural language. On the other
hand, the lower-probability text generated by stochastic methods is perceived
as more human-like. In this note, we offer an explanation for this phenomenon
by analyzing language generation through an information-theoretic lens.
Specifically, we posit that human-like language should contain an amount of
information (quantified as negative log-probability) that is close to the
entropy of the distribution over natural strings. Further, we posit that
language with substantially more (or less) information is undesirable. We
provide preliminary empirical evidence in favor of this hypothesis; quality
ratings of both human and machine-generated text -- covering multiple tasks and
common decoding strategies -- suggest high-quality text has an information
content significantly closer to the entropy than we would expect by chance.
Related papers
- A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
Given a general language model and its aligned version, there exists a trade-off between the average reward and average log-likelihood of the strings under the general language model.
We provide a formal treatment of this issue and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Mutual Information Alleviates Hallucinations in Abstractive
Summarization [73.48162198041884]
We find a simple criterion under which models are significantly more likely to assign more probability to hallucinated content during generation: high model uncertainty.
This finding offers a potential explanation for hallucinations: models default to favoring text with high marginal probability, when uncertain about a continuation.
We propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token--rather than purely the probability of the target token--when the model exhibits uncertainty.
arXiv Detail & Related papers (2022-10-24T13:30:54Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Improving Diversity of Neural Text Generation via Inverse Probability
Weighting [43.36560720793425]
We propose a sampling method inspired by inverse probability weighting.
We show might contain tedious or even repetitive candidates with high probability that lead to repetition loops.
Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
arXiv Detail & Related papers (2021-03-13T08:17:40Z) - Statistical patterns of word frequency suggesting the probabilistic
nature of human languages [5.059800023492045]
The study shows that important linguistic issues, such as linguistic universal, diachronic drift, and language variations can be translated into probability and frequency patterns in parole.
These findings suggest that human language may well be probabilistic systems by nature, and that statistical may well make inherent properties of human languages.
arXiv Detail & Related papers (2020-12-01T00:48:27Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.