Typical Decoding for Natural Language Generation
- URL: http://arxiv.org/abs/2202.00666v1
- Date: Tue, 1 Feb 2022 18:58:45 GMT
- Title: Typical Decoding for Natural Language Generation
- Authors: Clara Meister, Tiago Pimentel, Gian Wiher, Ryan Cotterell
- Abstract summary: We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
- Score: 76.69397802617064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite achieving incredibly low perplexities on myriad natural language
corpora, today's language models still often underperform when used to generate
text. This dichotomy has puzzled the language generation community for the last
few years. In this work, we posit that the abstraction of natural language as a
communication channel (\`a la Shannon, 1948) can provide new insights into the
behaviors of probabilistic language generators, e.g., why high-probability
texts can be dull or repetitive. Humans use language as a means of
communicating information, and do so in an efficient yet error-minimizing
manner, choosing each word in a string with this (perhaps subconscious) goal in
mind. We propose that generation from probabilistic models should mimic this
behavior. Rather than always choosing words from the high-probability region of
the distribution--which have a low Shannon information content--we sample from
the set of words with an information content close to its expected value, i.e.,
close to the conditional entropy of our model. This decision criterion can be
realized through a simple and efficient implementation, which we call typical
sampling. Automatic and human evaluations show that, in comparison to nucleus
and top-k sampling, typical sampling offers competitive performance in terms of
quality while consistently reducing the number of degenerate repetitions.
Related papers
- A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - A Natural Bias for Language Generation Models [31.44752136404971]
We show that we can endow standard neural language generation models with a separate module that reflects unigram frequency statistics as prior knowledge.
We use neural machine translation as a test bed for this simple technique and observe that it: (i) improves learning efficiency; (ii) achieves better overall performance; and perhaps most importantly: appears to disentangle strong frequency effects.
arXiv Detail & Related papers (2022-12-19T18:14:36Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Evaluating Models of Robust Word Recognition with Serial Reproduction [8.17947290421835]
We compare several broad-coverage probabilistic generative language models in their ability to capture human linguistic expectations.
We find that those models that make use of abstract representations of preceding linguistic context best predict the changes made by people in the course of serial reproduction.
arXiv Detail & Related papers (2021-01-24T20:16:12Z) - Limits of Detecting Text Generated by Large-Scale Language Models [65.46403462928319]
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
Here we formulate large-scale language model output detection as a hypothesis testing problem to classify text as genuine or generated.
arXiv Detail & Related papers (2020-02-09T19:53:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.