Compression, Transduction, and Creation: A Unified Framework for
Evaluating Natural Language Generation
- URL: http://arxiv.org/abs/2109.06379v1
- Date: Tue, 14 Sep 2021 01:00:42 GMT
- Title: Compression, Transduction, and Creation: A Unified Framework for
Evaluating Natural Language Generation
- Authors: Mingkai Deng, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
- Abstract summary: Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives.
We propose a unifying perspective based on the nature of information change in NLG tasks.
We develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks.
- Score: 85.32991360774447
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language generation (NLG) spans a broad range of tasks, each of which
serves for specific objectives and desires different properties of generated
text. The complexity makes automatic evaluation of NLG particularly
challenging. Previous work has typically focused on a single task and developed
individual evaluation metrics based on specific intuitions. In this paper, we
propose a unifying perspective based on the nature of information change in NLG
tasks, including compression (e.g., summarization), transduction (e.g., text
rewriting), and creation (e.g., dialog). Information alignment between input,
context, and output text plays a common central role in characterizing the
generation. With automatic alignment prediction models, we develop a family of
interpretable metrics that are suitable for evaluating key aspects of different
NLG tasks, often without need of gold reference data. Experiments show the
uniformly designed metrics achieve stronger or comparable correlations with
human judgement compared to state-of-the-art metrics in each of diverse tasks,
including text summarization, style transfer, and knowledge-grounded dialog.
Related papers
- UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems [43.266153244137215]
Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks.
We decompose the use of multiple sources in generating personalized response into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation.
We propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG)
arXiv Detail & Related papers (2024-01-24T06:50:20Z) - Evaluation Metrics of Language Generation Models for Synthetic Traffic
Generation Tasks [22.629816738693254]
We show that common NLG metrics, like BLEU, are not suitable for evaluating Synthetic Traffic Generation (STG)
We propose and evaluate several metrics designed to compare the generated traffic to the distribution of real user texts.
arXiv Detail & Related papers (2023-11-21T11:26:26Z) - Automatic and Human-AI Interactive Text Generation [27.05024520190722]
This tutorial aims to provide an overview of the state-of-the-art natural language generation research.
Text-to-text generation tasks are more constrained in terms of semantic consistency and targeted language styles.
arXiv Detail & Related papers (2023-10-05T20:26:15Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - On Decoding Strategies for Neural Text Generators [73.48162198041884]
We study the interaction between language generation tasks and decoding strategies.
We measure changes in attributes of generated text as a function of both decoding strategy and task.
Our results reveal both previously-observed and surprising findings.
arXiv Detail & Related papers (2022-03-29T16:25:30Z) - TextFlint: Unified Multilingual Robustness Evaluation Toolkit for
Natural Language Processing [73.16475763422446]
We propose a multilingual robustness evaluation platform for NLP tasks (TextFlint)
It incorporates universal text transformation, task-specific transformation, adversarial attack, subpopulation, and their combinations to provide comprehensive robustness analysis.
TextFlint generates complete analytical reports as well as targeted augmented data to address the shortcomings of the model's robustness.
arXiv Detail & Related papers (2021-03-21T17:20:38Z) - GRUEN for Evaluating Linguistic Quality of Generated Text [17.234442722611803]
We propose GRUEN for evaluating Grammaticality, non-Redundancy, focUs, structure and coherENce of generated text.
GRUEN utilizes a BERT-based model and a class of syntactic, semantic, and contextual features to examine the system output.
arXiv Detail & Related papers (2020-10-06T05:59:25Z) - Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems.
We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability.
Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.