Related papers: Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders

Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders

URL: http://arxiv.org/abs/2210.02077v1
Date: Wed, 5 Oct 2022 08:08:55 GMT
Title: Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders
Authors: Youngwan Lee, Jeffrey Willette, Jonghee Kim, Juho Lee, Sung Ju Hwang
Abstract summary: Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers. We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE. RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
Score: 64.03000385267339
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Masked image modeling (MIM) has become a popular strategy for self-supervised learning~(SSL) of visual representations with Vision Transformers. A representative MIM model, the masked auto-encoder (MAE), randomly masks a subset of image patches and reconstructs the masked patches given the unmasked patches. Concurrently, many recent works in self-supervised learning utilize the student/teacher paradigm which provides the student with an additional target based on the output of a teacher composed of an exponential moving average (EMA) of previous students. Although common, relatively little is known about the dynamics of the interaction between the student and teacher. Through analysis on a simple linear model, we find that the teacher conditionally removes previous gradient directions based on feature similarities which effectively acts as a conditional momentum regularizer. From this analysis, we present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE. We find that RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training, which may provide a way to enhance the practicality of prohibitively expensive self-supervised learning of Vision Transformer models. Additionally, we show that RC-MAE achieves more robustness and better performance compared to MAE on downstream tasks such as ImageNet-1K classification, object detection, and instance segmentation.

Related papers

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning [16.05598829701769]
CMT-MAE leverages a simple collaborative masking mechanism through linear aggregation across attentions from both teacher and student models. Our framework pre-trained on ImageNet-1K achieves state-of-the-art linear probing and fine-tuning performance.
arXiv Detail & Related papers (2024-12-23T13:37:26Z)
Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies. We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z)
CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z)
Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data. We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process. In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z)
MOMA:Distill from Self-Supervised Teachers [6.737710830712818]
We propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms. Experiments show MOMA delivers compact student models with comparable performance to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-02-04T04:23:52Z)
Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training. Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance. We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z)
Exploring Target Representations for Masked Autoencoders [78.57196600585462]
We show that a careful choice of the target representation is unnecessary for learning good representations. We propose a multi-stage masked distillation pipeline and use a randomly model as the teacher. A proposed method to perform masked knowledge distillation with bootstrapped teachers (dBOT) outperforms previous self-supervised methods by nontrivial margins.
arXiv Detail & Related papers (2022-09-08T16:55:19Z)
Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed. It simultaneously learns a masking function and an image encoder using an adversarial objective. It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.