The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis
and Algorithm for Robust Natural Language Generation
- URL: http://arxiv.org/abs/2302.06784v1
- Date: Tue, 14 Feb 2023 02:02:33 GMT
- Title: The Stable Entropy Hypothesis and Entropy-Aware Decoding: An Analysis
and Algorithm for Robust Natural Language Generation
- Authors: Kushal Arora, Timothy J. O'Donnell, Doina Precup, Jason Weston, Jackie
C.K.Cheung
- Abstract summary: We show that human-like'' generations usually lie in a narrow and nearly flat entropy band.
We propose an entropy-aware decoding algorithm that respects these entropy bounds.
- Score: 59.7381286976957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art language generation models can degenerate when applied to
open-ended generation problems such as text completion, story generation, or
dialog modeling. This degeneration usually shows up in the form of incoherence,
lack of vocabulary diversity, and self-repetition or copying from the context.
In this paper, we postulate that ``human-like'' generations usually lie in a
narrow and nearly flat entropy band, and violation of these entropy bounds
correlates with degenerate behavior. Our experiments show that this stable
narrow entropy zone exists across models, tasks, and domains and confirm the
hypothesis that violations of this zone correlate with degeneration. We then
use this insight to propose an entropy-aware decoding algorithm that respects
these entropy bounds resulting in less degenerate, more contextual, and
"human-like" language generation in open-ended text generation settings.
Related papers
- Vector-Quantized Prompt Learning for Paraphrase Generation [18.40940464497253]
This paper proposes to generate diverse and high-quality paraphrases by exploiting the pre-trained models with instance-dependent prompts.
Extensive experiments demonstrate that the proposed method achieves new state-of-art results on three benchmark datasets.
arXiv Detail & Related papers (2023-11-25T07:13:06Z) - Generating Sequences by Learning to Self-Correct [64.0249217590888]
Self-Correction decouples an imperfect base generator from a separate corrector that learns to iteratively correct imperfect generations.
We show that Self-Correction improves upon the base generator in three diverse generation tasks.
arXiv Detail & Related papers (2022-10-31T18:09:51Z) - Why is constrained neural language generation particularly challenging? [13.62873478165553]
We present an extensive survey on the emerging topic of constrained neural language generation.
We distinguish between conditions and constraints, present constrained text generation tasks, and review existing methods and evaluation metrics for constrained text generation.
Our aim is to highlight recent progress and trends in this emerging field, informing on the most promising directions and limitations towards advancing the state-of-the-art of constrained neural language generation research.
arXiv Detail & Related papers (2022-06-11T02:07:33Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Reconciling the Discrete-Continuous Divide: Towards a Mathematical
Theory of Sparse Communication [0.0]
We build rigorous theoretical foundations for discrete/continuous hybrids.
We introduce "mixed languages" as strings of hybrid symbols and a new mixed weighted finite state automaton.
arXiv Detail & Related papers (2021-04-01T20:31:13Z) - Lexically-constrained Text Generation through Commonsense Knowledge
Extraction and Injection [62.071938098215085]
We focus on the Commongen benchmark, wherein the aim is to generate a plausible sentence for a given set of input concepts.
We propose strategies for enhancing the semantic correctness of the generated text.
arXiv Detail & Related papers (2020-12-19T23:23:40Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens.
We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation.
We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.