Evaluating the Evaluation of Diversity in Natural Language Generation
- URL: http://arxiv.org/abs/2004.02990v3
- Date: Sun, 24 Jan 2021 09:49:19 GMT
- Title: Evaluating the Evaluation of Diversity in Natural Language Generation
- Authors: Guy Tevet, Jonathan Berant
- Abstract summary: We propose a framework for evaluating diversity metrics in natural language generation systems.
Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
- Score: 43.05127848086264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite growing interest in natural language generation (NLG) models that
produce diverse outputs, there is currently no principled method for evaluating
the diversity of an NLG system. In this work, we propose a framework for
evaluating diversity metrics. The framework measures the correlation between a
proposed diversity metric and a diversity parameter, a single parameter that
controls some aspect of diversity in generated text. For example, a diversity
parameter might be a binary variable used to instruct crowdsourcing workers to
generate text with either low or high content diversity. We demonstrate the
utility of our framework by: (a) establishing best practices for eliciting
diversity judgments from humans, (b) showing that humans substantially
outperform automatic metrics in estimating content diversity, and (c)
demonstrating that existing methods for controlling diversity by tuning a
"decoding parameter" mostly affect form but not meaning. Our framework can
advance the understanding of different diversity metrics, an essential step on
the road towards better NLG systems.
Related papers
- Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting [28.971248570622603]
We propose a diversity metric called structural diversity, where the user provides a mapping from generated text to features capturing the kinds of diversity that they care about.
In our experiments, we show that for structural diversity in the poetry and code domains, CoS significantly improves diversity compared to several baselines.
arXiv Detail & Related papers (2024-08-12T14:34:06Z) - Enhancing LLM-Based Human-Robot Interaction with Nuances for Diversity Awareness [0.0]
This paper presents a system for diversity-aware autonomous conversation leveraging the capabilities of large language models (LLMs)
The system adapts to diverse populations and individuals, considering factors like background, personality, age, gender, and culture.
To assess the system's performance, we conducted both controlled and real-world experiments, measuring a wide range of performance indicators.
arXiv Detail & Related papers (2024-06-25T13:15:36Z) - Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL [51.48239006107272]
In this paper, we discuss how to measure and improve the diversity of the demonstrations for text-to-diversity research.
We propose fusing iteratively for demonstrations (Fused) to build a high-diversity demonstration pool.
Our method achieves an average improvement of 3.2% and 5.0% with and without human labeling on several mainstream datasets.
arXiv Detail & Related papers (2024-02-16T13:13:18Z) - Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting [19.79214899011072]
This paper formalizes diversity of representation in generative large language models.
We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes.
We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
arXiv Detail & Related papers (2023-10-25T10:17:17Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Exploring Diversity in Back Translation for Low-Resource Machine
Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems.
Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations.
This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z) - Semantic Diversity in Dialogue with Natural Language Inference [19.74618235525502]
This paper makes two substantial contributions to improving diversity in dialogue generation.
First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation.
Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation.
arXiv Detail & Related papers (2022-05-03T13:56:32Z) - Decoding and Diversity in Machine Translation [90.33636694717954]
We characterize differences between cost diversity paid for the BLEU scores enjoyed by NMT.
Our study implicates search as a salient source of known bias when translating gender pronouns.
arXiv Detail & Related papers (2020-11-26T21:09:38Z) - Random Network Distillation as a Diversity Metric for Both Image and
Text Generation [62.13444904851029]
We develop a new diversity metric that can be applied to data, both synthetic and natural, of any type.
We validate and deploy this metric on both images and text.
arXiv Detail & Related papers (2020-10-13T22:03:52Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.