Multilingual Lexical Simplification via Paraphrase Generation
- URL: http://arxiv.org/abs/2307.15286v1
- Date: Fri, 28 Jul 2023 03:47:44 GMT
- Title: Multilingual Lexical Simplification via Paraphrase Generation
- Authors: Kang Liu, Jipeng Qiang, Yun Li, Yunhao Yuan, Yi Zhu, Kaixun Hua
- Abstract summary: We propose a novel multilingual LS method via paraphrase generation.
We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation.
Our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
- Score: 19.275642346073557
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Lexical simplification (LS) methods based on pretrained language models have
made remarkable progress, generating potential substitutes for a complex word
through analysis of its contextual surroundings. However, these methods require
separate pretrained models for different languages and disregard the
preservation of sentence meaning. In this paper, we propose a novel
multilingual LS method via paraphrase generation, as paraphrases provide
diversity in word selection while preserving the sentence's meaning. We regard
paraphrasing as a zero-shot translation task within multilingual neural machine
translation that supports hundreds of languages. After feeding the input
sentence into the encoder of paraphrase modeling, we generate the substitutes
based on a novel decoding strategy that concentrates solely on the lexical
variations of the complex word. Experimental results demonstrate that our
approach surpasses BERT-based methods and zero-shot GPT3-based method
significantly on English, Spanish, and Portuguese.
Related papers
- ParaLS: Lexical Substitution via Pretrained Paraphraser [18.929859707202517]
This study explores how to generate the substitute candidates from a paraphraser.
We propose two simple decoding strategies that focus on the variations of the target word during decoding.
arXiv Detail & Related papers (2023-05-14T12:49:16Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Retrieval-Augmented Multilingual Keyphrase Generation with
Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text.
We call attention to a new setting named multilingual keyphrase generation.
We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - Improving the Diversity of Unsupervised Paraphrasing with Embedding
Outputs [28.16894664889912]
We present a novel technique for zero-shot paraphrase generation.
Key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora.
arXiv Detail & Related papers (2021-10-25T19:33:38Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - Chinese Lexical Simplification [29.464388721085548]
There is no research work for Chinese lexical simplification ( CLS) task.
To circumvent difficulties in acquiring annotations, we manually create the first benchmark dataset for CLS.
We present five different types of methods as baselines to generate substitute candidates for the complex word.
arXiv Detail & Related papers (2020-10-14T12:55:36Z) - Paraphrase Generation as Zero-Shot Multilingual Translation:
Disentangling Semantic Similarity from Lexical and Syntactic Diversity [11.564158965143418]
We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input.
Our approach enables paraphrase generation in many languages from a single multilingual NMT model.
arXiv Detail & Related papers (2020-08-11T18:05:34Z) - Multilingual Chart-based Constituency Parse Extraction from Pre-trained
Language Models [21.2879567125422]
We propose a novel method for extracting complete (binary) parses from pre-trained language models.
By applying our method on multilingual PLMs, it becomes possible to induce non-trivial parses for sentences from nine languages.
arXiv Detail & Related papers (2020-04-08T05:42:26Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.