Explain to me like I am five -- Sentence Simplification Using
Transformers
- URL: http://arxiv.org/abs/2212.04595v1
- Date: Thu, 8 Dec 2022 22:57:18 GMT
- Title: Explain to me like I am five -- Sentence Simplification Using
Transformers
- Authors: Aman Agarwal
- Abstract summary: Sentence simplification aims at making the structure of text easier to read and understand while maintaining its original meaning.
This can be helpful for people with disabilities, new language learners, or those with low literacy.
Previous research have focused on tackling this task by either using external linguistic databases for simplification or by using control tokens for desired fine-tuning of sentences.
We experiment with a combination of GPT-2 and BERT models, achieving the best SARI score of 46.80 on the Mechanical Turk dataset.
- Score: 2.017876577978849
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sentence simplification aims at making the structure of text easier to read
and understand while maintaining its original meaning. This can be helpful for
people with disabilities, new language learners, or those with low literacy.
Simplification often involves removing difficult words and rephrasing the
sentence. Previous research have focused on tackling this task by either using
external linguistic databases for simplification or by using control tokens for
desired fine-tuning of sentences. However, in this paper we purely use
pre-trained transformer models. We experiment with a combination of GPT-2 and
BERT models, achieving the best SARI score of 46.80 on the Mechanical Turk
dataset, which is significantly better than previous state-of-the-art results.
The code can be found at https://github.com/amanbasu/sentence-simplification.
Related papers
- Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs [19.023628411128406]
We propose a method that replaces words with high Age of Acquisitions (AoA) in translations with simpler words to match the translations to the user's level.
The experimental results obtained from the dataset show that our method effectively replaces high-AoA words with lower-AoA words.
arXiv Detail & Related papers (2024-08-08T04:57:36Z) - A New Dataset and Empirical Study for Sentence Simplification in Chinese [50.0624778757462]
This paper introduces CSS, a new dataset for assessing sentence simplification in Chinese.
We collect manual simplifications from human annotators and perform data analysis to show the difference between English and Chinese sentence simplifications.
In the end, we explore whether Large Language Models can serve as high-quality Chinese sentence simplification systems by evaluating them on CSS.
arXiv Detail & Related papers (2023-06-07T06:47:34Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification [0.0]
We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences.
The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity.
This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
arXiv Detail & Related papers (2023-04-16T13:13:58Z) - SimpLex: a lexical text simplification architecture [0.5156484100374059]
We present textscSimpLex, a novel simplification architecture for generating simplified English sentences.
The proposed architecture uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity.
The solution is incorporated into a user-friendly and simple-to-use software.
arXiv Detail & Related papers (2023-04-14T08:52:31Z) - Exploiting Summarization Data to Help Text Simplification [50.0624778757462]
We analyzed the similarity between text summarization and text simplification and exploited summarization data to help simplify.
We named these pairs Sum4Simp (S4S) and conducted human evaluations to show that S4S is high-quality.
arXiv Detail & Related papers (2023-02-14T15:32:04Z) - Phrase-level Active Learning for Neural Machine Translation [107.28450614074002]
We propose an active learning setting where we can spend a given budget on translating in-domain data.
We select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators.
In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods.
arXiv Detail & Related papers (2021-06-21T19:20:42Z) - Three Sentences Are All You Need: Local Path Enhanced Document Relation
Extraction [54.95848026576076]
We present an embarrassingly simple but effective method to select evidence sentences for document-level RE.
We have released our code at https://github.com/AndrewZhe/Three-Sentences-Are-All-You-Need.
arXiv Detail & Related papers (2021-06-03T12:29:40Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z) - MUSS: Multilingual Unsupervised Sentence Simplification by Mining
Paraphrases [20.84836431084352]
We introduce MUSS, a Multilingual Unsupervised Sentence Simplification system that does not require labeled simplification data.
MUSS uses a novel approach to sentence simplification that trains strong models using sentence-level paraphrase data instead of proper simplification data.
We evaluate our approach on English, French, and Spanish simplification benchmarks and closely match or outperform the previous best supervised results.
arXiv Detail & Related papers (2020-05-01T12:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.