SimpLex: a lexical text simplification architecture
- URL: http://arxiv.org/abs/2304.07002v1
- Date: Fri, 14 Apr 2023 08:52:31 GMT
- Title: SimpLex: a lexical text simplification architecture
- Authors: Ciprian-Octavian Truic\u{a}, Andrei-Ionut Stan, Elena-Simona Apostol
- Abstract summary: We present textscSimpLex, a novel simplification architecture for generating simplified English sentences.
The proposed architecture uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity.
The solution is incorporated into a user-friendly and simple-to-use software.
- Score: 0.5156484100374059
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Text simplification (TS) is the process of generating easy-to-understand
sentences from a given sentence or piece of text. The aim of TS is to reduce
both the lexical (which refers to vocabulary complexity and meaning) and
syntactic (which refers to the sentence structure) complexity of a given text
or sentence without the loss of meaning or nuance. In this paper, we present
\textsc{SimpLex}, a novel simplification architecture for generating simplified
English sentences. To generate a simplified sentence, the proposed architecture
uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence
transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity. The
solution is incorporated into a user-friendly and simple-to-use software. We
evaluate our system using two metrics, i.e., SARI, and Perplexity Decrease.
Experimentally, we observe that the transformer models outperform the other
models in terms of the SARI score. However, in terms of Perplexity, the
Word-Embeddings-based models achieve the biggest decrease. Thus, the main
contributions of this paper are: (1) We propose a new Word Embedding and
Transformer based algorithm for text simplification; (2) We design
\textsc{SimpLex} -- a modular novel text simplification system -- that can
provide a baseline for further research; and (3) We perform an in-depth
analysis of our solution and compare our results with two state-of-the-art
models, i.e., LightLS [19] and NTS-w2v [44]. We also make the code publicly
available online.
Related papers
- Syntactic Complexity Identification, Measurement, and Reduction Through
Controlled Syntactic Simplification [0.0]
We present a classical syntactic dependency-based approach to split and rephrase a compound and complex sentence into a set of simplified sentences.
The paper also introduces an algorithm to identify and measure a sentence's syntactic complexity.
This work is accepted and presented in International workshop on Learning with Knowledge Graphs (IWLKG) at WSDM-2023 Conference.
arXiv Detail & Related papers (2023-04-16T13:13:58Z) - Exploiting Summarization Data to Help Text Simplification [50.0624778757462]
We analyzed the similarity between text summarization and text simplification and exploited summarization data to help simplify.
We named these pairs Sum4Simp (S4S) and conducted human evaluations to show that S4S is high-quality.
arXiv Detail & Related papers (2023-02-14T15:32:04Z) - Lexical Simplification using multi level and modular approach [1.9559144041082446]
This paper explains the work done by our team "teamPN" for English sub task.
We created a modular pipeline which combines modern day transformers based models with traditional NLP methods.
arXiv Detail & Related papers (2023-02-03T15:57:54Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Text Simplification for Comprehension-based Question-Answering [7.144235435987265]
We release Simple-SQuAD, a simplified version of the widely-used SQuAD dataset.
We benchmark the newly created corpus and perform an ablation study for examining the effect of the simplification process in the SQuAD-based question answering task.
arXiv Detail & Related papers (2021-09-28T18:48:00Z) - Enriching Transformers with Structured Tensor-Product Representations
for Abstractive Summarization [131.23966358405767]
We adapt TP-TRANSFORMER with the explicitly compositional Product Representation (TPR) for the task of abstractive summarization.
Key feature of our model is a structural bias that we introduce by encoding two separate representations for each token.
We show that our TP-TRANSFORMER outperforms the Transformer and the original TP-TRANSFORMER significantly on several abstractive summarization datasets.
arXiv Detail & Related papers (2021-06-02T17:32:33Z) - Explainable Prediction of Text Complexity: The Missing Preliminaries for
Text Simplification [13.447565774887215]
Text simplification reduces the language complexity of professional content for accessibility purposes.
End-to-end neural network models have been widely adopted to directly generate the simplified version of input text.
We show that text simplification can be decomposed into a compact pipeline of tasks to ensure the transparency and explainability of the process.
arXiv Detail & Related papers (2020-07-31T03:33:37Z) - Neural CRF Model for Sentence Alignment in Text Simplification [31.62648025127563]
We create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia.
Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1.
A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.
arXiv Detail & Related papers (2020-05-05T16:47:51Z) - ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification
Models with Multiple Rewriting Transformations [97.27005783856285]
This paper introduces ASSET, a new dataset for assessing sentence simplification in English.
We show that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
arXiv Detail & Related papers (2020-05-01T16:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.