Lexical Simplification using multi level and modular approach
- URL: http://arxiv.org/abs/2302.01823v1
- Date: Fri, 3 Feb 2023 15:57:54 GMT
- Title: Lexical Simplification using multi level and modular approach
- Authors: Nikita Katyal, Pawan Kumar Rajpoot
- Abstract summary: This paper explains the work done by our team "teamPN" for English sub task.
We created a modular pipeline which combines modern day transformers based models with traditional NLP methods.
- Score: 1.9559144041082446
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text Simplification is an ongoing problem in Natural Language Processing,
solution to which has varied implications. In conjunction with the TSAR-2022
Workshop @EMNLP2022 Lexical Simplification is the process of reducing the
lexical complexity of a text by replacing difficult words with easier to read
(or understand) expressions while preserving the original information and
meaning. This paper explains the work done by our team "teamPN" for English sub
task. We created a modular pipeline which combines modern day transformers
based models with traditional NLP methods like paraphrasing and verb sense
disambiguation. We created a multi level and modular pipeline where the target
text is treated according to its semantics(Part of Speech Tag). Pipeline is
multi level as we utilize multiple source models to find potential candidates
for replacement, It is modular as we can switch the source models and their
weight-age in the final re-ranking.
Related papers
- External Knowledge Augmented Polyphone Disambiguation Using Large
Language Model [3.372242769313867]
We introduce a novel method to solve the problem as a generation task.
Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters.
Generation module adopts the decoder-only Transformer architecture to induce the target text.
Postprocess module corrects the generated text into a valid result if needed.
arXiv Detail & Related papers (2023-12-19T08:00:10Z) - Multilingual Lexical Simplification via Paraphrase Generation [19.275642346073557]
We propose a novel multilingual LS method via paraphrase generation.
We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation.
Our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
arXiv Detail & Related papers (2023-07-28T03:47:44Z) - On Conditional and Compositional Language Model Differentiable Prompting [75.76546041094436]
Prompts have been shown to be an effective method to adapt a frozen Pretrained Language Model (PLM) to perform well on downstream tasks.
We propose a new model, Prompt Production System (PRopS), which learns to transform task instructions or input metadata, into continuous prompts.
arXiv Detail & Related papers (2023-07-04T02:47:42Z) - SimpLex: a lexical text simplification architecture [0.5156484100374059]
We present textscSimpLex, a novel simplification architecture for generating simplified English sentences.
The proposed architecture uses either word embeddings (i.e., Word2Vec) and perplexity, or sentence transformers (i.e., BERT, RoBERTa, and GPT2) and cosine similarity.
The solution is incorporated into a user-friendly and simple-to-use software.
arXiv Detail & Related papers (2023-04-14T08:52:31Z) - UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical
Simplification? [2.931632009516441]
We describe a pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instances.
Applying to the Spanish and Portuguese subset, we achieve state-of-the-art results with only minor modification to the original prompts.
arXiv Detail & Related papers (2023-01-04T18:59:20Z) - Finstreder: Simple and fast Spoken Language Understanding with Finite
State Transducers using modern Speech-to-Text models [69.35569554213679]
In Spoken Language Understanding (SLU) the task is to extract important information from audio commands.
This paper presents a simple method for embedding intents and entities into Finite State Transducers.
arXiv Detail & Related papers (2022-06-29T12:49:53Z) - Charformer: Fast Character Transformers via Gradient-based Subword
Tokenization [50.16128796194463]
We propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
We introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters.
We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level.
arXiv Detail & Related papers (2021-06-23T22:24:14Z) - SML: a new Semantic Embedding Alignment Transformer for efficient
cross-lingual Natural Language Inference [71.57324258813674]
The ability of Transformers to perform with precision a variety of tasks such as question answering, Natural Language Inference (NLI) or summarising, have enable them to be ranked as one of the best paradigms to address this kind of tasks at present.
NLI is one of the best scenarios to test these architectures, due to the knowledge required to understand complex sentences and established a relation between a hypothesis and a premise.
In this paper, we propose a new architecture, siamese multilingual transformer, to efficiently align multilingual embeddings for Natural Language Inference.
arXiv Detail & Related papers (2021-03-17T13:23:53Z) - Controllable Text Simplification with Explicit Paraphrasing [88.02804405275785]
Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting.
Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously.
We propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles.
arXiv Detail & Related papers (2020-10-21T13:44:40Z) - Deep Transformer based Data Augmentation with Subword Units for
Morphologically Rich Online ASR [0.0]
Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR.
Recent studies showed that a considerable part of the knowledge of neural network Language Models (LM) can be transferred to traditional n-grams by using neural text generation based data augmentation.
We show that although data augmentation with Transformer-generated text works well for isolating languages, it causes a vocabulary explosion in a morphologically rich language.
We propose a new method called subword-based neural text augmentation, where we retokenize the generated text into statistically derived subwords.
arXiv Detail & Related papers (2020-07-14T10:22:05Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.