Pre-training via Paraphrasing
- URL: http://arxiv.org/abs/2006.15020v1
- Date: Fri, 26 Jun 2020 14:43:43 GMT
- Title: Pre-training via Paraphrasing
- Authors: Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida
Wang, Luke Zettlemoyer
- Abstract summary: We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
- Score: 96.79972492585112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce MARGE, a pre-trained sequence-to-sequence model learned with an
unsupervised multi-lingual multi-document paraphrasing objective. MARGE
provides an alternative to the dominant masked language modeling paradigm,
where we self-supervise the reconstruction of target text by retrieving a set
of related texts (in many languages) and conditioning on them to maximize the
likelihood of generating the original. We show it is possible to jointly learn
to do retrieval and reconstruction, given only a random initialization. The
objective noisily captures aspects of paraphrase, translation, multi-document
summarization, and information retrieval, allowing for strong zero-shot
performance on several tasks. For example, with no additional task-specific
training we achieve BLEU scores of up to 35.8 for document translation. We
further show that fine-tuning gives strong performance on a range of
discriminative and generative tasks in many languages, making MARGE the most
generally applicable pre-training method to date.
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Modeling Sequential Sentence Relation to Improve Cross-lingual Dense
Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM)
MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.
To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z) - Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process.
Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task.
We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - DOCmT5: Document-Level Pretraining of Multilingual Language Models [9.072507490639218]
We introduce DOCmT5, a multilingual sequence-to-sequence language model pre-trained with large scale parallel documents.
We propose a simple and effective pre-training objective - Document Reordering Machine Translation.
DrMT brings consistent improvements over strong baselines on a variety of document-level generation tasks.
arXiv Detail & Related papers (2021-12-16T08:58:52Z) - Few-Shot Text Generation with Pattern-Exploiting Training [12.919486518128734]
In this paper, we show that the underlying idea can also be applied to text generation tasks.
We adapt Pattern-Exploiting Training (PET), a recently proposed few-shot approach, for finetuning generative language models on text generation tasks.
arXiv Detail & Related papers (2020-12-22T10:53:07Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.