MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers
- URL: http://arxiv.org/abs/2212.07841v2
- Date: Mon, 19 Jun 2023 10:03:59 GMT
- Title: MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are
Better Dense Retrievers
- Authors: Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan,
Ji-Rong Wen
- Abstract summary: In this work, we aim to unify a variety of pre-training tasks into a multi-task pre-trained model, namely MASTER.
MASTER utilizes a shared-encoder multi-decoder architecture that can construct a representation bottleneck to compress the abundant semantic information across tasks into dense vectors.
- Score: 140.0479479231558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Transformers (\eg BERT) have been commonly used in existing dense
retrieval methods for parameter initialization, and recent studies are
exploring more effective pre-training tasks for further improving the quality
of dense vectors. Although various novel and effective tasks have been
proposed, their different input formats and learning objectives make them hard
to be integrated for jointly improving the model performance. In this work, we
aim to unify a variety of pre-training tasks into the bottlenecked masked
autoencoder manner, and integrate them into a multi-task pre-trained model,
namely MASTER. Concretely, MASTER utilizes a shared-encoder multi-decoder
architecture that can construct a representation bottleneck to compress the
abundant semantic information across tasks into dense vectors. Based on it, we
integrate three types of representative pre-training tasks: corrupted passages
recovering, related passages recovering and PLMs outputs recovering, to
characterize the inner-passage information, inter-passage relations and PLMs
knowledge. Extensive experiments have shown that our approach outperforms
competitive dense retrieval methods. Our code and data are publicly released in
\url{https://github.com/microsoft/SimXNS}.
Related papers
- Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z) - Pre-training Multi-task Contrastive Learning Models for Scientific
Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks.
We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z) - TransCoder: Towards Unified Transferable Code Representation Learning Inspired by Human Skills [31.75121546422898]
We present TransCoder, a unified Transferable fine-tuning strategy for Code representation learning.
We employ a tunable prefix encoder as the meta-learner to capture cross-task and cross-language transferable knowledge.
Our method can lead to superior performance on various code-related tasks and encourage mutual reinforcement.
arXiv Detail & Related papers (2023-05-23T06:59:22Z) - KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in
Few-Shot NLP [68.43279384561352]
Existing data augmentation algorithms leverage task-independent rules or fine-tune general-purpose pre-trained language models.
These methods have trivial task-specific knowledge and are limited to yielding low-quality synthetic data for weak baselines in simple tasks.
We propose the Knowledge Mixture Data Augmentation Model (KnowDA): an encoder-decoder LM pretrained on a mixture of diverse NLP tasks.
arXiv Detail & Related papers (2022-06-21T11:34:02Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Efficient Retrieval Optimized Multi-task Learning [16.189136169520424]
We propose a novel Retrieval Optimized Multi-task (ROM) framework for jointly training self-supervised tasks, knowledge retrieval, and extractive question answering.
Our ROM approach presents a unified and generalizable framework that enables scaling efficiently to multiple tasks.
Using our framework, we achieve comparable or better performance than recent methods on QA, while drastically reducing the number of parameters.
arXiv Detail & Related papers (2021-04-20T17:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.