SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation
- URL: http://arxiv.org/abs/2305.13194v2
- Date: Wed, 1 Nov 2023 22:29:53 GMT
- Title: SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation
- Authors: Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez,
Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan
Das, Ankur P. Parikh
- Abstract summary: We introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality.
We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE and mFACE.
- Score: 52.186343500576214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reliable automatic evaluation of summarization systems is challenging due to
the multifaceted and subjective nature of the task. This is especially the case
for languages other than English, where human evaluations are scarce. In this
work, we introduce SEAHORSE, a dataset for multilingual, multifaceted
summarization evaluation. SEAHORSE consists of 96K summaries with human ratings
along 6 dimensions of text quality: comprehensibility, repetition, grammar,
attribution, main ideas, and conciseness, covering 6 languages, 9 systems and 4
datasets. As a result of its size and scope, SEAHORSE can serve both as a
benchmark to evaluate learnt metrics, as well as a large-scale resource for
training such metrics. We show that metrics trained with SEAHORSE achieve
strong performance on the out-of-domain meta-evaluation benchmarks TRUE
(Honovich et al., 2022) and mFACE (Aharoni et al., 2022). We make the SEAHORSE
dataset and metrics publicly available for future research on multilingual and
multifaceted summarization evaluation.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.