Toward Human-Like Evaluation for Natural Language Generation with Error
Analysis
- URL: http://arxiv.org/abs/2212.10179v1
- Date: Tue, 20 Dec 2022 11:36:22 GMT
- Title: Toward Human-Like Evaluation for Natural Language Generation with Error
Analysis
- Authors: Qingyu Lu, Liang Ding, Liping Xie, Kanjian Zhang, Derek F. Wong,
Dacheng Tao
- Abstract summary: Recent studies show that considering both major errors (e.g. mistranslated tokens) and minor errors can produce high-quality human judgments.
This inspires us to approach the final goal of the evaluation metrics (human-like evaluations) by automatic error analysis.
We augment BARTScore by incorporating the human-like error analysis strategies, namely BARTScore++, where the final score consists of both the evaluations of major errors and minor errors.
- Score: 93.34894810865364
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The state-of-the-art language model-based automatic metrics, e.g. BARTScore,
benefiting from large-scale contextualized pre-training, have been successfully
used in a wide range of natural language generation (NLG) tasks, including
machine translation, text summarization, and data-to-text. Recent studies show
that considering both major errors (e.g. mistranslated tokens) and minor errors
(e.g. imperfections in fluency) can produce high-quality human judgments. This
inspires us to approach the final goal of the evaluation metrics (human-like
evaluations) by automatic error analysis. To this end, we augment BARTScore by
incorporating the human-like error analysis strategies, namely BARTScore++,
where the final score consists of both the evaluations of major errors and
minor errors. Experimental results show that BARTScore++ can consistently
improve the performance of vanilla BARTScore and outperform existing
top-scoring metrics in 20 out of 25 test settings. We hope our technique can
also be extended to other pre-trained model-based metrics. We will release our
code and scripts to facilitate the community.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.