Evaluating Factuality in Text Simplification
- URL: http://arxiv.org/abs/2204.07562v1
- Date: Fri, 15 Apr 2022 17:37:09 GMT
- Title: Evaluating Factuality in Text Simplification
- Authors: Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li
- Abstract summary: We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs.
We find that errors often appear in both that are not captured by existing evaluation metrics.
- Score: 43.94402649899681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated simplification models aim to make input texts more readable. Such
methods have the potential to make complex information accessible to a wider
audience, e.g., providing access to recent medical literature which might
otherwise be impenetrable for a lay reader. However, such models risk
introducing errors into automatically simplified texts, for instance by
inserting statements unsupported by the corresponding original text, or by
omitting key information. Providing more readable but inaccurate versions of
texts may in many cases be worse than providing no such access at all. The
problem of factual accuracy (and the lack thereof) has received heightened
attention in the context of summarization models, but the factuality of
automatically simplified texts has not been investigated. We introduce a
taxonomy of errors that we use to analyze both references drawn from standard
simplification datasets and state-of-the-art model outputs. We find that errors
often appear in both that are not captured by existing evaluation metrics,
motivating a need for research into ensuring the factual accuracy of automated
simplification models.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors [11.07539342949602]
We propose an end-to-end framework for detecting factual errors in text summarization.
Our framework uses a diverse set of LLM prompts to identify factual inconsistencies.
We calibrate the ensembled models to produce empirically accurate probabilities that a text is factually consistent or free of hallucination.
arXiv Detail & Related papers (2024-06-18T18:59:37Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Simple-QE: Better Automatic Quality Estimation for Text Simplification [22.222195626377907]
We propose Simple-QE, a BERT-based quality estimation (QE) model adapted from prior summarization QE work.
We show that Simple-QE correlates well with human quality judgments.
We also show that we can adapt this approach to accurately predict the complexity of human-written texts.
arXiv Detail & Related papers (2020-12-22T22:02:37Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - Text as Environment: A Deep Reinforcement Learning Text Readability
Assessment Model [2.826553192869411]
The efficiency of state-of-the-art text readability assessment models can be further improved using deep reinforcement learning models.
A comparison of the model on Weebit and Cambridge Exams with state-of-the-art models, such as the BERT text readability model, shows that it is capable of achieving state-of-the-art accuracy with a significantly smaller amount of input text than other models.
arXiv Detail & Related papers (2019-12-12T13:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.