CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with
Mixture-of-Textual-Experts for Passage Retrieval
- URL: http://arxiv.org/abs/2304.10195v1
- Date: Thu, 20 Apr 2023 10:12:09 GMT
- Title: CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with
Mixture-of-Textual-Experts for Passage Retrieval
- Authors: Guangyuan Ma, Xing Wu, Peng Wang, Songlin Hu
- Abstract summary: Contextual Masked Auto-Encoder has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval.
We propose to pre-train Contextual Masked Auto-Encoder with Mixture-of-Textual-Experts (CoT-MoTE)
- Score: 23.69812399753584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Passage retrieval aims to retrieve relevant passages from large collections
of the open-domain corpus. Contextual Masked Auto-Encoding has been proven
effective in representation bottleneck pre-training of a monolithic
dual-encoder for passage retrieval. Siamese or fully separated dual-encoders
are often adopted as basic retrieval architecture in the pre-training and
fine-tuning stages for encoding queries and passages into their latent
embedding spaces. However, simply sharing or separating the parameters of the
dual-encoder results in an imbalanced discrimination of the embedding spaces.
In this work, we propose to pre-train Contextual Masked Auto-Encoder with
Mixture-of-Textual-Experts (CoT-MoTE). Specifically, we incorporate
textual-specific experts for individually encoding the distinct properties of
queries and passages. Meanwhile, a shared self-attention layer is still kept
for unified attention modeling. Results on large-scale passage retrieval
benchmarks show steady improvement in retrieval performances. The quantitive
analysis also shows a more balanced discrimination of the latent embedding
spaces.
Related papers
- Attributable and Scalable Opinion Summarization [79.87892048285819]
We generate abstractive summaries by decoding frequent encodings, and extractive summaries by selecting the sentences assigned to the same frequent encodings.
Our method is attributable, because the model identifies sentences used to generate the summary as part of the summarization process.
It scales easily to many hundreds of input reviews, because aggregation is performed in the latent space rather than over long sequences of tokens.
arXiv Detail & Related papers (2023-05-19T11:30:37Z) - BERM: Training the Balanced and Extractable Representation for Matching
to Improve Generalization Ability of Dense Retrieval [54.66399120084227]
We propose a novel method to improve the generalization of dense retrieval via capturing matching signal called BERM.
Dense retrieval has shown promise in the first-stage retrieval process when trained on in-domain labeled datasets.
arXiv Detail & Related papers (2023-05-18T15:43:09Z) - RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [12.37229805276939]
We propose a novel pre-training method called Duplex Masked Auto-Encoder, a.k.a. DupMAE.
It is designed to improve the quality semantic representation where all contextualized embeddings of the pretrained model can be leveraged.
arXiv Detail & Related papers (2023-05-04T05:37:22Z) - CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for
Passage Retrieval [34.08763911138496]
This study brings multi-view modeling to the contextual masked auto-encoder.
We refer to this multi-view pretraining method as CoT-MAE v2.
arXiv Detail & Related papers (2023-04-05T08:00:38Z) - Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler
Alignment of Embeddings for Asymmetrical dual encoders [89.29256833403169]
We introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods.
KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation.
Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having 3x faster inference.
arXiv Detail & Related papers (2023-03-31T15:44:13Z) - RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [3.4523793651427113]
We propose duplex masked auto-encoder, a.k.a. DupMAE, which targets on improving the semantic representation capacity for contextualized embeddings of both [] and ordinary tokens.
DupMAE is simple but empirically competitive: with a small decoding cost, it substantially contributes to the model's representation capability and transferability.
arXiv Detail & Related papers (2022-11-16T08:57:55Z) - ConTextual Mask Auto-Encoder for Dense Passage Retrieval [49.49460769701308]
CoT-MAE is a simple yet effective generative pre-training method for dense passage retrieval.
It learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding.
We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines.
arXiv Detail & Related papers (2022-08-16T11:17:22Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Analysis of Joint Speech-Text Embeddings for Semantic Matching [3.6423306784901235]
We study a joint speech-text embedding space trained for semantic matching by minimizing the distance between paired utterance and transcription inputs.
We extend our method to incorporate automatic speech recognition through both pretraining and multitask scenarios.
arXiv Detail & Related papers (2022-04-04T04:50:32Z) - Dual Encoding for Video Retrieval by Text [49.34356217787656]
We propose a dual deep encoding network that encodes videos and queries into powerful dense representations of their own.
Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding.
Second, different from a conventional common space learning algorithm which is either concept based or latent space based, we introduce hybrid space learning.
arXiv Detail & Related papers (2020-09-10T15:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.