Related papers: Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation

Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation

URL: http://arxiv.org/abs/2404.03278v1
Date: Thu, 4 Apr 2024 08:04:24 GMT
Title: Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation
Authors: Liam Cripwell, Joël Legrand, Claire Gardent,
Abstract summary: This paper focuses on the evaluation of document-level text simplification. We compare existing models using distinct metrics for meaning preservation and simplification. We introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation.
Score: 9.618393813409266
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text simplification intends to make a text easier to read while preserving its core meaning. Intuitively and as shown in previous works, these two dimensions (simplification and meaning preservation) are often-times inversely correlated. An overly conservative text will fail to simplify sufficiently, whereas extreme simplification will degrade meaning preservation. Yet, popular evaluation metrics either aggregate meaning preservation and simplification into a single score (SARI, LENS), or target meaning preservation alone (BERTScore, QuestEval). Moreover, these metrics usually require a set of references and most previous work has only focused on sentence-level simplification. In this paper, we focus on the evaluation of document-level text simplification and compare existing models using distinct metrics for meaning preservation and simplification. We leverage existing metrics from similar tasks and introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation, seldom performing well on both dimensions. Making use of the fact that the metrics we use are all reference-less, we also investigate the performance of existing models when applied to unseen data (where reference simplifications are unavailable).

Related papers

Simple is not Enough: Document-level Text Simplification using Readability and Coherence [20.613410797137036]
We present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence. We include multiple objectives during training, considering simplicity, readability, and coherence altogether. We present a comparative analysis in which we evaluate our proposed models in a zero-shot, few-shot, and fine-tuning setting using document-level TS corpora.
arXiv Detail & Related papers (2024-12-24T19:05:21Z)
Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability. Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences. Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z)
REFeREE: A REference-FREE Model-Based Metric for Text Simplification [10.863256257378172]
REFeREE is a model-based metric with a 3-stage curriculum. Our experiments show that REFeREE outperforms existing reference-based metrics in predicting overall ratings and reaches competitive and consistent performance in predicting specific ratings while requiring no reference simplifications at inference time.
arXiv Detail & Related papers (2024-03-26T12:21:51Z)
Simplicity Level Estimate (SLE): A Learned Reference-Less Metric for Sentence Simplification [8.479659578608233]
We propose a new learned evaluation metric (SLE) for sentence simplification. SLE focuses on simplicity, outperforming almost all existing metrics in terms of correlation with human judgements.
arXiv Detail & Related papers (2023-10-12T09:49:10Z)
On the Limitations of Reference-Free Evaluations of Generated Text [64.81682222169113]
We show that reference-free metrics are inherently biased and limited in their ability to evaluate generated text. We argue that they should not be used to measure progress on tasks like machine translation or summarization.
arXiv Detail & Related papers (2022-10-22T22:12:06Z)
SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations. We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences. Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z)
Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia. We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z)
Rethinking Automatic Evaluation in Sentence Simplification [10.398614920404727]
We propose a simple modification of QuestEval allowing it to tackle Sentence Simplification. We show that the latter obtain state-of-the-art correlations, outperforming standard metrics like BLEU and SARI. We release a new corpus of evaluated simplifications, this time not generated by systems but instead, written by humans.
arXiv Detail & Related papers (2021-04-15T16:13:50Z)
Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries [74.28810048824519]
We analyze the token alignments used by ROUGE and BERTScore to compare summaries. We argue that their scores largely cannot be interpreted as measuring information overlap.
arXiv Detail & Related papers (2020-10-23T15:55:15Z)
Small but Mighty: New Benchmarks for Split and Rephrase [18.959219419951083]
Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues. We show that even a simple rule-based model can perform on par with the state-of-the-art model.
arXiv Detail & Related papers (2020-09-17T23:37:33Z)
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English. We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.