ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations
- URL: http://arxiv.org/abs/2005.00481v1
- Date: Fri, 1 May 2020 16:44:54 GMT
- Title: ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations
- Authors: Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina
Scarton, Beno\^it Sagot, Lucia Specia
- Abstract summary: This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
- Score: 97.27005783856285
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to simplify a sentence, human editors perform multiple rewriting
transformations: they split it into several shorter sentences, paraphrase words
(i.e. replacing complex words or phrases by simpler synonyms), reorder
components, and/or delete information deemed unnecessary. Despite these varied
range of possible text alterations, current models for automatic sentence
simplification are evaluated using datasets that are focused on a single
transformation, such as lexical paraphrasing or splitting. This makes it
impossible to understand the ability of simplification models in more realistic
settings. To alleviate this limitation, this paper introduces ASSET, a new
dataset for assessing sentence simplification in English. ASSET is a
crowdsourced multi-reference corpus where each simplification was produced by
executing several rewriting transformations. Through quantitative and
qualitative experiments, we show that simplifications in ASSET are better at
capturing characteristics of simplicity when compared to other standard
evaluation datasets for the task. Furthermore, we motivate the need for
developing better methods for automatic evaluation using ASSET, since we show
that current popular metrics may not be suitable when multiple simplification
transformations are performed.
Related papers
- Inexact Simplification of Symbolic Regression Expressions with Locality-sensitive Hashing [0.7373617024876725]
Symbolic regression searches for parametric models that accurately fit a dataset, prioritizing simplicity and interpretability.
Applying a fast algebraic simplification may not fully simplify the expression and exact methods can be infeasible depending on size or complexity of the expressions.
We propose a novel simplification and bloat control for SR employing an efficient memoization with locality-sensitive hashing (LHS)
arXiv Detail & Related papers (2024-04-08T22:54:14Z) - SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages [87.08880616654258]
We introduce the SWiPE dataset, which reconstructs the document-level editing process from English Wikipedia (EW) articles to paired Simple Wikipedia (SEW) articles.
We work with Wikipedia editors to annotate 5,000 EW-SEW document pairs, labeling more than 40,000 edits with proposed 19 categories.
We find that SWiPE-trained models generate more complex edits while reducing unwanted edits.
arXiv Detail & Related papers (2023-05-30T16:52:42Z) - SASS: Data and Methods for Subject Aware Sentence Simplification [0.0]
This paper provides a dataset aimed at training models that perform subject aware sentence simplifications.
We also test models on that dataset which are inspired by model architecture used in abstractive summarization.
arXiv Detail & Related papers (2023-03-26T00:02:25Z) - Exploiting Summarization Data to Help Text Simplification [50.0624778757462]
We analyzed the similarity between text summarization and text simplification and exploited summarization data to help simplify.
We named these pairs Sum4Simp (S4S) and conducted human evaluations to show that S4S is high-quality.
arXiv Detail & Related papers (2023-02-14T15:32:04Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Neural CRF Model for Sentence Alignment in Text Simplification [31.62648025127563]
We create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia.
Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1.
A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.
arXiv Detail & Related papers (2020-05-05T16:47:51Z) - MUSS: Multilingual Unsupervised Sentence Simplification by Mining
Paraphrases [20.84836431084352]
We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data.
MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data.
We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results.
arXiv Detail & Related papers (2020-05-01T12:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.