Related papers: Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval

URL: http://arxiv.org/abs/2401.11248v2
Date: Mon, 22 Apr 2024 10:44:14 GMT
Title: Drop your Decoder: Pre-training with Bag-of-Word Prediction for Dense Passage Retrieval
Authors: Guangyuan Ma, Xing Wu, Zijia Lin, Songlin Hu,
Abstract summary: Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems. We propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task. Our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters.
Score: 26.00149743478937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked auto-encoder pre-training has emerged as a prevalent technique for initializing and enhancing dense retrieval systems. It generally utilizes additional Transformer decoder blocks to provide sustainable supervision signals and compress contextual information into dense representations. However, the underlying reasons for the effectiveness of such a pre-training technique remain unclear. The usage of additional Transformer-based decoders also incurs significant computational costs. In this study, we aim to shed light on this issue by revealing that masked auto-encoder (MAE) pre-training with enhanced decoding significantly improves the term coverage of input tokens in dense representations, compared to vanilla BERT checkpoints. Building upon this observation, we propose a modification to the traditional MAE by replacing the decoder of a masked auto-encoder with a completely simplified Bag-of-Word prediction task. This modification enables the efficient compression of lexical signals into dense representations through unsupervised pre-training. Remarkably, our proposed method achieves state-of-the-art retrieval performance on several large-scale retrieval benchmarks without requiring any additional parameters, which provides a 67% training speed-up compared to standard masked auto-encoder pre-training with enhanced decoding.

Related papers

Should we pre-train a decoder in contrastive learning for dense prediction tasks? [0.7237068561453082]
We propose a framework-agnostic adaptation to convert an encoder-only self-supervised learning (SSL) contrastive approach to an efficient encoder-decoder framework. We first update the existing architecture to accommodate a decoder and its respective contrastive loss. We then introduce a weighted encoder-decoder contrastive loss with non-competing objectives that facilitates the joint encoder-decoder architecture pre-training.
arXiv Detail & Related papers (2025-03-21T20:19:13Z)
Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval [10.905033385938982]
Masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. We propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder.
arXiv Detail & Related papers (2023-05-22T16:27:10Z)
Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving [74.28510044056706]
Existing methods usually adopt the decoupled encoder-decoder paradigm. In this work, we aim to alleviate the problem by two principles. We first predict a coarse-grained future position and action based on the encoder features. Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly.
arXiv Detail & Related papers (2023-05-10T15:22:02Z)
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods. KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation. Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z)
ConTextual Mask Auto-Encoder for Dense Passage Retrieval [49.49460769701308]
CoT-MAE is a simple yet effective generative pre-training method for dense passage retrieval. It learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines.
arXiv Detail & Related papers (2022-08-16T11:17:22Z)
RetroMAE: Pre-training Retrieval-oriented Transformers via Masked Auto-Encoder [15.24707645921207]
We propose a novel pre-training framework for dense retrieval based on the Masked Auto-Encoder, known as RetroMAE. We pre-train a BERT like encoder on English Wikipedia and BookCorpus, where it notably outperforms the existing pre-trained models on a wide range of dense retrieval benchmarks.
arXiv Detail & Related papers (2022-05-24T12:43:04Z)
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer [11.814012909512307]
We propose a simple yet powerful improvement over the recent Self-Supervised Audio Spectrogram Transformer (SSAST) model for speech and audio classification. We leverage the insight that the SSAST uses a very high masking ratio (75%) during pretraining, meaning that the vast majority of self-attention compute is performed on mask tokens. We find that MAE-like pretraining can provide a 3x speedup and 2x memory usage reduction over the vanilla SSAST.
arXiv Detail & Related papers (2022-03-30T22:06:13Z)
Context Autoencoder for Self-Supervised Representation Learning [64.63908944426224]
We pretrain an encoder by making predictions in the encoded representation space. The network is an encoder-regressor-decoder architecture. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks.
arXiv Detail & Related papers (2022-02-07T09:33:45Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder. We train a Transformer-based sequence encoder over a large set of short sequences. Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.