Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval
- URL: http://arxiv.org/abs/2210.15133v1
- Date: Thu, 27 Oct 2022 02:43:48 GMT
- Title: Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval
- Authors: Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie
- Abstract summary: Masked Language Modeling (MLM) is a major sub-task of the pre-training process.
Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task.
We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
- Score: 16.592276887533714
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pre-trained language model (PTM) has been shown to yield powerful text
representations for dense passage retrieval task. The Masked Language Modeling
(MLM) is a major sub-task of the pre-training process. However, we found that
the conventional random masking strategy tend to select a large number of
tokens that have limited effect on the passage retrieval task (e,g. stop-words
and punctuation). By noticing the term importance weight can provide valuable
information for passage retrieval, we hereby propose alternative retrieval
oriented masking (dubbed as ROM) strategy where more important tokens will have
a higher probability of being masked out, to capture this straightforward yet
essential information to facilitate the language model pre-training process.
Notably, the proposed new token masking method will not change the architecture
and learning objective of original PTM. Our experiments verify that the
proposed ROM enables term importance information to help language model
pre-training thus achieving better performance on multiple passage retrieval
benchmarks.
Related papers
- Self-Evolution Learning for Discriminative Language Model Pretraining [103.57103957631067]
Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
arXiv Detail & Related papers (2023-05-24T16:00:54Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Sample Efficient Approaches for Idiomaticity Detection [6.481818246474555]
This work explores sample efficient methods of idiomaticity detection.
In particular, we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings.
Our experiments show that whilePET improves performance on English, they are much less effective on Portuguese and Galician, leading to an overall performance about on par with vanilla mBERT.
arXiv Detail & Related papers (2022-05-23T13:46:35Z) - KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive
Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP)
Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem.
Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Data Efficient Masked Language Modeling for Vision and Language [16.95631509102115]
Masked language modeling (MLM) is one of the key sub-tasks in vision-language training.
In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text.
We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings.
arXiv Detail & Related papers (2021-09-05T11:27:53Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z) - Pre-training via Paraphrasing [96.79972492585112]
We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual paraphrasing objective.
We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization.
For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation.
arXiv Detail & Related papers (2020-06-26T14:43:43Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.