On the probability-quality paradox in language generation
- URL: http://arxiv.org/abs/2203.17217v1
- Date: Thu, 31 Mar 2022 17:43:53 GMT
- Title: On the probability-quality paradox in language generation
- Authors: Clara Meister and Gian Wiher and Tiago Pimentel and Ryan Cotterell
- Abstract summary: We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
- Score: 76.69397802617064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When generating natural language from neural probabilistic models, high
probability does not always coincide with high quality: It has often been
observed that mode-seeking decoding methods, i.e., those that produce
high-probability text under the model, lead to unnatural language. On the other
hand, the lower-probability text generated by stochastic methods is perceived
as more human-like. In this note, we offer an explanation for this phenomenon
by analyzing language generation through an information-theoretic lens.
Specifically, we posit that human-like language should contain an amount of
information (quantified as negative log-probability) that is close to the
entropy of the distribution over natural strings. Further, we posit that
language with substantially more (or less) information is undesirable. We
provide preliminary empirical evidence in favor of this hypothesis; quality
ratings of both human and machine-generated text -- covering multiple tasks and
common decoding strategies -- suggest high-quality text has an information
content significantly closer to the entropy than we would expect by chance.
Related papers
- QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Mutual Information Alleviates Hallucinations in Abstractive
Summarization [73.48162198041884]
We find a simple criterion under which models are significantly more likely to assign more probability to hallucinated content during generation: high model uncertainty.
This finding offers a potential explanation for hallucinations: models default to favoring text with high marginal probability, when uncertain about a continuation.
We propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token--rather than purely the probability of the target token--when the model exhibits uncertainty.
arXiv Detail & Related papers (2022-10-24T13:30:54Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.