What Comes Next? Evaluating Uncertainty in Neural Text Generators
Against Human Production Variability
- URL: http://arxiv.org/abs/2305.11707v2
- Date: Fri, 20 Oct 2023 14:31:21 GMT
- Title: What Comes Next? Evaluating Uncertainty in Neural Text Generators
Against Human Production Variability
- Authors: Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fern\'andez,
Barbara Plank
- Abstract summary: We characterise the extent to which human production varies lexically, syntactically, and semantically across four Natural Language Generation (NLG) tasks.
We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty.
We analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples provides the level of detail necessary to gain understanding of a model's representation of uncertainty.
- Score: 28.403105682913374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Natural Language Generation (NLG) tasks, for any input, multiple
communicative goals are plausible, and any goal can be put into words, or
produced, in multiple ways. We characterise the extent to which human
production varies lexically, syntactically, and semantically across four NLG
tasks, connecting human production variability to aleatoric or data
uncertainty. We then inspect the space of output strings shaped by a generation
system's predicted probability distribution and decoding algorithm to probe its
uncertainty. For each test input, we measure the generator's calibration to
human production variability. Following this instance-level approach, we
analyse NLG models and decoding strategies, demonstrating that probing a
generator with multiple samples and, when possible, multiple references,
provides the level of detail necessary to gain understanding of a model's
representation of uncertainty. Code available at
https://github.com/dmg-illc/nlg-uncertainty-probes.
Related papers
- Evaluation Metrics of Language Generation Models for Synthetic Traffic
Generation Tasks [22.629816738693254]
We show that common NLG metrics, like BLEU, are not suitable for evaluating Synthetic Traffic Generation (STG)
We propose and evaluate several metrics designed to compare the generated traffic to the distribution of real user texts.
arXiv Detail & Related papers (2023-11-21T11:26:26Z) - Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges.
Our model is trained on user queries and LLM-generated responses under massive real-world scenarios.
Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z) - Diverse Text Generation via Variational Encoder-Decoder Models with
Gaussian Process Priors [21.71928935339393]
We present a novel latent structured variable model to generate high quality texts.
Specifically, we introduce a function to map deterministic encoder hidden states into random context variables.
To address the learning challenge of Gaussian processes, we propose an efficient variational inference approach.
arXiv Detail & Related papers (2022-04-04T04:09:15Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem.
We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training.
Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.