MOMA:Distill from Self-Supervised Teachers
- URL: http://arxiv.org/abs/2302.02089v1
- Date: Sat, 4 Feb 2023 04:23:52 GMT
- Title: MOMA:Distill from Self-Supervised Teachers
- Authors: Yuchong Yao, Nandakishor Desai, Marimuthu Palaniswami
- Abstract summary: We propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms.
Experiments show MOMA delivers compact student models with comparable performance to existing state-of-the-art methods.
- Score: 6.737710830712818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive Learning and Masked Image Modelling have demonstrated exceptional
performance on self-supervised representation learning, where Momentum Contrast
(i.e., MoCo) and Masked AutoEncoder (i.e., MAE) are the state-of-the-art,
respectively. In this work, we propose MOMA to distill from pre-trained MoCo
and MAE in a self-supervised manner to collaborate the knowledge from both
paradigms. We introduce three different mechanisms of knowledge transfer in the
propsoed MOMA framework. : (1) Distill pre-trained MoCo to MAE. (2) Distill
pre-trained MAE to MoCo (3) Distill pre-trained MoCo and MAE to a random
initialized student. During the distillation, the teacher and the student are
fed with original inputs and masked inputs, respectively. The learning is
enabled by aligning the normalized representations from the teacher and the
projected representations from the student. This simple design leads to
efficient computation with extremely high mask ratio and dramatically reduced
training epochs, and does not require extra considerations on the distillation
target. The experiments show MOMA delivers compact student models with
comparable performance to existing state-of-the-art methods, combining the
power of both self-supervised learning paradigms. It presents competitive
results against different benchmarks in computer vision. We hope our method
provides an insight on transferring and adapting the knowledge from large-scale
pre-trained models in a computationally efficient way.
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.
We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z) - Mixed Autoencoder for Self-supervised Visual Representation Learning [95.98114940999653]
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction.
This paper studies the prevailing mixing augmentation for MAE.
arXiv Detail & Related papers (2023-03-30T05:19:43Z) - Ensemble knowledge distillation of self-supervised speech models [84.69577440755457]
Distilled self-supervised models have shown competitive performance and efficiency in recent years.
We performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM.
Our method improves the performance of the distilled models on four downstream speech processing tasks.
arXiv Detail & Related papers (2023-02-24T17:15:39Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - Exploring Target Representations for Masked Autoencoders [78.57196600585462]
We show that a careful choice of the target representation is unnecessary for learning good representations.
We propose a multi-stage masked distillation pipeline and use a randomly model as the teacher.
A proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins.
arXiv Detail & Related papers (2022-09-08T16:55:19Z) - MimCo: Masked Image Modeling Pre-training with Contrastive Teacher [14.413674270588023]
Masked image modeling (MIM) has received much attention in self-supervised learning (SSL)
visualizations show that the learned representations are less separable, especially compared to those based on contrastive learning pre-training.
We propose a novel and flexible pre-training framework, named MimCo, which combines MIM and contrastive learning through two-stage pre-training.
arXiv Detail & Related papers (2022-09-07T10:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.