Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for
Sentence Simplification
- URL: http://arxiv.org/abs/2310.08170v1
- Date: Thu, 12 Oct 2023 09:49:10 GMT
- Title: Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for
Sentence Simplification
- Authors: Liam Cripwell, Jo\"el Legrand, Claire Gardent
- Abstract summary: We propose a new learned evaluation metric (SLE) for sentence simplification.
SLE focuses on simplicity, outperforming almost all existing metrics in terms of correlation with human judgements.
- Score: 8.479659578608233
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic evaluation for sentence simplification remains a challenging
problem. Most popular evaluation metrics require multiple high-quality
references -- something not readily available for simplification -- which makes
it difficult to test performance on unseen domains. Furthermore, most existing
metrics conflate simplicity with correlated attributes such as fluency or
meaning preservation. We propose a new learned evaluation metric (SLE) which
focuses on simplicity, outperforming almost all existing metrics in terms of
correlation with human judgements.
Related papers
- Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation [9.618393813409266]
This paper focuses on the evaluation of document-level text simplification.
We compare existing models using distinct metrics for meaning preservation and simplification.
We introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation.
arXiv Detail & Related papers (2024-04-04T08:04:24Z) - REFeREE: A REference-FREE Model-Based Metric for Text Simplification [10.863256257378172]
REFeREE is a model-based metric with a 3-stage curriculum.
Our experiments show that REFeREE outperforms existing reference-based metrics in predicting overall ratings and reaches competitive and consistent performance in predicting specific ratings while requiring no reference simplifications at inference time.
arXiv Detail & Related papers (2024-03-26T12:21:51Z) - An In-depth Evaluation of GPT-4 in Sentence Simplification with
Error-based Human Assessment [10.816677544269782]
We design an error-based human annotation framework to assess the GPT-4's simplification capabilities.
Results show that GPT-4 generally generates fewer erroneous simplification outputs compared to the current state-of-the-art.
arXiv Detail & Related papers (2024-03-08T00:19:24Z) - FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction [85.26780391682894]
We propose Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction (FENICE)
FENICE leverages an NLI-based alignment between information in the source document and a set of atomic facts, referred to as claims, extracted from the summary.
Our metric sets a new state of the art on AGGREFACT, the de-facto benchmark for factuality evaluation.
arXiv Detail & Related papers (2024-03-04T17:57:18Z) - Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References [123.39034752499076]
Div-Ref is a method to enhance evaluation benchmarks by enriching the number of references.
We conduct experiments to empirically demonstrate that diversifying the expression of reference can significantly enhance the correlation between automatic evaluation and human evaluation.
arXiv Detail & Related papers (2023-05-24T11:53:29Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Rethinking Automatic Evaluation in Sentence Simplification [10.398614920404727]
We propose a simple modification of QuestEval allowing it to tackle Sentence Simplification.
We show that the latter obtain state-of-the-art correlations, outperforming standard metrics like BLEU and SARI.
We release a new corpus of evaluated simplifications, this time not generated by systems but instead, written by humans.
arXiv Detail & Related papers (2021-04-15T16:13:50Z) - Towards Question-Answering as an Automatic Metric for Evaluating the
Content Quality of a Summary [65.37544133256499]
We propose a metric to evaluate the content quality of a summary using question-answering (QA)
We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval.
arXiv Detail & Related papers (2020-10-01T15:33:09Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.