CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for
Passage Retrieval
- URL: http://arxiv.org/abs/2304.03158v1
- Date: Wed, 5 Apr 2023 08:00:38 GMT
- Title: CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for
Passage Retrieval
- Authors: Xing Wu, Guangyuan Ma, Peng Wang, Meng Lin, Zijia Lin, Fuzheng Zhang
and Songlin Hu
- Abstract summary: This study brings multi-view modeling to the contextual masked auto-encoder.
We refer to this multi-view pretraining method as CoT-MAE v2.
- Score: 34.08763911138496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Growing techniques have been emerging to improve the performance of passage
retrieval. As an effective representation bottleneck pretraining technique, the
contextual masked auto-encoder utilizes contextual embedding to assist in the
reconstruction of passages. However, it only uses a single auto-encoding
pre-task for dense representation pre-training. This study brings multi-view
modeling to the contextual masked auto-encoder. Firstly, multi-view
representation utilizes both dense and sparse vectors as multi-view
representations, aiming to capture sentence semantics from different aspects.
Moreover, multiview decoding paradigm utilizes both autoencoding and
auto-regressive decoders in representation bottleneck pre-training, aiming to
provide both reconstructive and generative signals for better contextual
representation pretraining. We refer to this multi-view pretraining method as
CoT-MAE v2. Through extensive experiments, we show that CoT-MAE v2 is effective
and robust on large-scale passage retrieval benchmarks and out-of-domain
zero-shot benchmarks.
Related papers
- Multimodal Autoregressive Pre-training of Large Vision Encoders [85.39154488397931]
We present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process.
Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification.
arXiv Detail & Related papers (2024-11-21T18:31:25Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Few-shot Action Recognition with Captioning Foundation Models [61.40271046233581]
CapFSAR is a framework to exploit knowledge of multimodal models without manually annotating text.
Visual-text aggregation module based on Transformer is further designed to incorporate cross-modal-temporal complementary information.
experiments on multiple standard few-shot benchmarks demonstrate that the proposed CapFSAR performs favorably against existing methods.
arXiv Detail & Related papers (2023-10-16T07:08:39Z) - RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [12.37229805276939]
We propose a novel pre-training method called Duplex Masked Auto-Encoder, a.k.a. DupMAE.
It is designed to improve the quality semantic representation where all contextualized embeddings of the pretrained model can be leveraged.
arXiv Detail & Related papers (2023-05-04T05:37:22Z) - CoT-MoTE: Exploring ConTextual Masked Auto-Encoder Pre-training with
Mixture-of-Textual-Experts for Passage Retrieval [23.69812399753584]
Contextual Masked Auto-Encoder has been proven effective in representation bottleneck pre-training of a monolithic dual-encoder for passage retrieval.
We propose to pre-train Contextual Masked Auto-Encoder with Mixture-of-Textual-Experts (CoT-MoTE)
arXiv Detail & Related papers (2023-04-20T10:12:09Z) - TimeMAE: Self-Supervised Representations of Time Series with Decoupled
Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.
The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme.
To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z) - RetroMAE v2: Duplex Masked Auto-Encoder For Pre-Training
Retrieval-Oriented Language Models [3.4523793651427113]
We propose duplex masked auto-encoder, a.k.a. DupMAE, which targets on improving the semantic representation capacity for contextualized embeddings of both [] and ordinary tokens.
DupMAE is simple but empirically competitive: with a small decoding cost, it substantially contributes to the model's representation capability and transferability.
arXiv Detail & Related papers (2022-11-16T08:57:55Z) - ConTextual Mask Auto-Encoder for Dense Passage Retrieval [49.49460769701308]
CoT-MAE is a simple yet effective generative pre-training method for dense passage retrieval.
It learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding.
We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines.
arXiv Detail & Related papers (2022-08-16T11:17:22Z) - Enhanced Modality Transition for Image Captioning [51.72997126838352]
We build a Modality Transition Module (MTM) to transfer visual features into semantic representations before forwarding them to the language model.
During the training phase, the modality transition network is optimised by the proposed modality loss.
Experiments have been conducted on the MS-COCO dataset demonstrating the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-02-23T07:20:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.