Do Text Simplification Systems Preserve Meaning? A Human Evaluation via
Reading Comprehension
- URL: http://arxiv.org/abs/2312.10126v2
- Date: Wed, 28 Feb 2024 11:16:29 GMT
- Title: Do Text Simplification Systems Preserve Meaning? A Human Evaluation via
Reading Comprehension
- Authors: Sweta Agrawal, Marine Carpuat
- Abstract summary: We introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions.
We conduct a thorough human evaluation of texts by humans and by nine automatic systems.
- Score: 22.154454849167077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic text simplification (TS) aims to automate the process of rewriting
text to make it easier for people to read. A pre-requisite for TS to be useful
is that it should convey information that is consistent with the meaning of the
original text. However, current TS evaluation protocols assess system outputs
for simplicity and meaning preservation without regard for the document context
in which output sentences occur and for how people understand them. In this
work, we introduce a human evaluation framework to assess whether simplified
texts preserve meaning using reading comprehension questions. With this
framework, we conduct a thorough human evaluation of texts by humans and by
nine automatic systems. Supervised systems that leverage pre-training knowledge
achieve the highest scores on the reading comprehension (RC) tasks amongst the
automatic controllable TS systems. However, even the best-performing supervised
system struggles with at least 14% of the questions, marking them as
"unanswerable'' based on simplified content. We further investigate how
existing TS evaluation metrics and automatic question-answering systems
approximate the human judgments we obtained.
Related papers
- Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text.
We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts.
We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z) - Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Digital Comprehensibility Assessment of Simplified Texts among Persons
with Intellectual Disabilities [2.446971913303003]
We conducted an evaluation study of text comprehensibility including participants with and without intellectual disabilities reading German texts on a tablet computer.
We explored four different approaches to measuring comprehensibility: multiple-choice comprehension questions, perceived difficulty ratings, response time, and reading speed.
For the target group of persons with intellectual disabilities, comprehension questions emerged as the most reliable measure, while analyzing reading speed provided valuable insights into participants' reading behavior.
arXiv Detail & Related papers (2024-02-20T15:37:08Z) - ChatPRCS: A Personalized Support System for English Reading
Comprehension based on ChatGPT [3.847982502219679]
This paper presents a novel personalized support system for reading comprehension, referred to as ChatPRCS.
ChatPRCS employs methods including reading comprehension proficiency prediction, question generation, and automatic evaluation.
arXiv Detail & Related papers (2023-09-22T11:46:44Z) - DecompEval: Evaluating Generated Texts as Unsupervised Decomposed
Question Answering [95.89707479748161]
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability.
We propose a metric called DecompEval that formulates NLG evaluation as an instruction-style question answering task.
We decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence.
The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result.
arXiv Detail & Related papers (2023-07-13T16:16:51Z) - Controlling Pre-trained Language Models for Grade-Specific Text
Simplification [22.154454849167077]
We study how different control mechanisms impact the adequacy and simplicity of text simplification systems.
We introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis.
arXiv Detail & Related papers (2023-05-24T10:29:45Z) - LENS: A Learnable Evaluation Metric for Text Simplification [17.48383068498169]
We present LENS, a learnable evaluation metric for text simplification.
We also introduce Rank and Rate, a human evaluation framework that rates simplifications from several models in a list-wise manner.
arXiv Detail & Related papers (2022-12-19T18:56:52Z) - Open-Retrieval Conversational Machine Reading [80.13988353794586]
In conversational machine reading, systems need to interpret natural language rules, answer high-level questions, and ask follow-up clarification questions.
Existing works assume the rule text is provided for each user question, which neglects the essential retrieval step in real scenarios.
In this work, we propose and investigate an open-retrieval setting of conversational machine reading.
arXiv Detail & Related papers (2021-02-17T08:55:01Z) - Simple-QE: Better Automatic Quality Estimation for Text Simplification [22.222195626377907]
We propose Simple-QE, a BERT-based quality estimation (QE) model adapted from prior summarization QE work.
We show that Simple-QE correlates well with human quality judgments.
We also show that we can adapt this approach to accurately predict the complexity of human-written texts.
arXiv Detail & Related papers (2020-12-22T22:02:37Z) - Re-evaluating Evaluation in Text Summarization [77.4601291738445]
We re-evaluate the evaluation method for text summarization using top-scoring system outputs.
We find that conclusions about evaluation metrics on older datasets do not necessarily hold on modern datasets and systems.
arXiv Detail & Related papers (2020-10-14T13:58:53Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.