Unsupervised Lexical Simplification with Context Augmentation
- URL: http://arxiv.org/abs/2311.00310v1
- Date: Wed, 1 Nov 2023 05:48:05 GMT
- Title: Unsupervised Lexical Simplification with Context Augmentation
- Authors: Takashi Wada, Timothy Baldwin, Jey Han Lau
- Abstract summary: Given a target word and its context, our method generates substitutes based on the target context and additional contexts sampled from monolingual data.
We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages.
- Score: 55.318201742039
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new unsupervised lexical simplification method that uses only
monolingual data and pre-trained language models. Given a target word and its
context, our method generates substitutes based on the target context and also
additional contexts sampled from monolingual data. We conduct experiments in
English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that
our model substantially outperforms other unsupervised systems across all
languages. We also establish a new state-of-the-art by ensembling our model
with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution
data set, achieving a state-of-the-art result.
Related papers
- OCHADAI at SemEval-2022 Task 2: Adversarial Training for Multilingual
Idiomaticity Detection [4.111899441919165]
We propose a multilingual adversarial training model for determining whether a sentence contains an idiomatic expression.
Our model relies on pre-trained contextual representations from different multi-lingual state-of-the-art transformer-based language models.
arXiv Detail & Related papers (2022-06-07T05:52:43Z) - Injecting Text and Cross-lingual Supervision in Few-shot Learning from
Self-Supervised Models [33.66135770490531]
We show how universal phoneset acoustic models can leverage cross-lingual supervision to improve transfer of self-supervised representations to new languages.
We also show how target-language text can be used to enable and improve fine-tuning with the lattice-free maximum mutual information objective.
arXiv Detail & Related papers (2021-10-10T17:33:44Z) - Neural semi-Markov CRF for Monolingual Word Alignment [20.897157172049877]
We present a novel neural semi-Markov CRF alignment model, which unifies word and phrase alignments through variable-length spans.
We also create a new benchmark with human annotations that cover four different text genres to evaluate monolingual word alignment models.
arXiv Detail & Related papers (2021-06-04T16:04:00Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context.
We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.