Exploiting Summarization Data to Help Text Simplification
- URL: http://arxiv.org/abs/2302.07124v1
- Date: Tue, 14 Feb 2023 15:32:04 GMT
- Title: Exploiting Summarization Data to Help Text Simplification
- Authors: Renliang Sun, Zhixian Yang, Xiaojun Wan
- Abstract summary: We analyzed the similarity between text summarization and text simplification and exploited summarization data to help simplify.
We named these pairs Sum4Simp (S4S) and conducted human evaluations to show that S4S is high-quality.
- Score: 50.0624778757462
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: One of the major problems with text simplification is the lack of
high-quality data. The sources of simplification datasets are limited to
Wikipedia and Newsela, restricting further development of this field. In this
paper, we analyzed the similarity between text summarization and text
simplification and exploited summarization data to help simplify. First, we
proposed an alignment algorithm to extract sentence pairs from summarization
datasets. Then, we designed four attributes to characterize the degree of
simplification and proposed a method to filter suitable pairs. We named these
pairs Sum4Simp (S4S). Next, we conducted human evaluations to show that S4S is
high-quality and compared it with a real simplification dataset. Finally, we
conducted experiments to illustrate that the S4S can improve the performance of
several mainstream simplification models, especially in low-resource scenarios.
Related papers
- A New Dataset and Empirical Study for Sentence Simplification in Chinese [50.0624778757462]
This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
arXiv Detail & Related papers (2023-06-07T06:47:34Z) - SASS: Data and Methods for Subject Aware Sentence Simplification [0.0]
This paper provides a dataset aimed at training models that perform subject aware sentence simplifications.
We also test models on that dataset which are inspired by model architecture used in abstractive summarization.
arXiv Detail & Related papers (2023-03-26T00:02:25Z) - NapSS: Paragraph-level Medical Text Simplification via Narrative
Prompting and Sentence-matching Summarization [46.772517928718216]
We propose a summarize-then-simplify two-stage strategy, which we call NapSS.
NapSS identifies the relevant content to simplify while ensuring that the original narrative flow is preserved.
Our model achieves significantly better than the seq2seq baseline on an English medical corpus.
arXiv Detail & Related papers (2023-02-11T02:20:25Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Text Simplification for Comprehension-based Question-Answering [7.144235435987265]
We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset.
We benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task.
arXiv Detail & Related papers (2021-09-28T18:48:00Z) - Neural CRF Model for Sentence Alignment in Text Simplification [31.62648025127563]
We create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia.
Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1.
A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.
arXiv Detail & Related papers (2020-05-05T16:47:51Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - Extractive Summarization as Text Matching [123.09816729675838]
This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.
We formulate the extractive summarization task as a semantic text matching problem.
We have driven the state-of-the-art extractive result on CNN/DailyMail to a new level (44.41 in ROUGE-1)
arXiv Detail & Related papers (2020-04-19T08:27:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.