Improving the Diversity of Unsupervised Paraphrasing with Embedding
Outputs
- URL: http://arxiv.org/abs/2110.13231v1
- Date: Mon, 25 Oct 2021 19:33:38 GMT
- Title: Improving the Diversity of Unsupervised Paraphrasing with Embedding
Outputs
- Authors: Monisha Jegadeesan, Sachin Kumar, John Wieting, Yulia Tsvetkov
- Abstract summary: We present a novel technique for zero-shot paraphrase generation.
Key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora.
- Score: 28.16894664889912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel technique for zero-shot paraphrase generation. The key
contribution is an end-to-end multilingual paraphrasing model that is trained
using translated parallel corpora to generate paraphrases into "meaning spaces"
-- replacing the final softmax layer with word embeddings. This architectural
modification, plus a training procedure that incorporates an autoencoding
objective, enables effective parameter sharing across languages for more fluent
monolingual rewriting, and facilitates fluency and diversity in generation. Our
continuous-output paraphrase generation models outperform zero-shot
paraphrasing baselines when evaluated on two languages using a battery of
computational metrics as well as in human assessment.
Related papers
- Multilingual Lexical Simplification via Paraphrase Generation [19.275642346073557]
We propose a novel multilingual LS method via paraphrase generation.
We regard paraphrasing as a zero-shot translation task within multilingual neural machine translation.
Our approach surpasses BERT-based methods and zero-shot GPT3-based method significantly on English, Spanish, and Portuguese.
arXiv Detail & Related papers (2023-07-28T03:47:44Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Paraphrase Generation as Zero-Shot Multilingual Translation:
Disentangling Semantic Similarity from Lexical and Syntactic Diversity [11.564158965143418]
We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input.
Our approach enables paraphrase generation in many languages from a single multilingual NMT model.
arXiv Detail & Related papers (2020-08-11T18:05:34Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce.
Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z) - Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised
Neural Machine Translation [5.958653653305609]
We incorporate widely available bilingual dictionaries that yield word-by-word translations to generate synthetic sentences.
This automatically expands the vocabulary of the model while maintaining high quality content.
arXiv Detail & Related papers (2020-04-05T02:14:14Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.