CL-MAE: Curriculum-Learned Masked Autoencoders
- URL: http://arxiv.org/abs/2308.16572v3
- Date: Wed, 28 Feb 2024 08:31:17 GMT
- Title: CL-MAE: Curriculum-Learned Masked Autoencoders
- Authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B.
Moeslund, Radu Tudor Ionescu
- Abstract summary: We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
- Score: 49.24994655813455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked image modeling has been demonstrated as a powerful pretext task for
generating robust representations that can be effectively generalized across
multiple downstream tasks. Typically, this approach involves randomly masking
patches (tokens) in input images, with the masking strategy remaining unchanged
during training. In this paper, we propose a curriculum learning approach that
updates the masking strategy to continually increase the complexity of the
self-supervised reconstruction task. We conjecture that, by gradually
increasing the task complexity, the model can learn more sophisticated and
transferable representations. To facilitate this, we introduce a novel
learnable masking module that possesses the capability to generate masks of
different complexities, and integrate the proposed module into masked
autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting
its behavior during training, transitioning from a partner to the MAE
(optimizing the same reconstruction loss) to an adversary (optimizing the
opposite loss), while passing through a neutral state. The transition between
these behaviors is smooth, being regulated by a factor that is multiplied with
the reconstruction loss of the masking module. The resulting training procedure
generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked
Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior
representation learning capabilities compared to MAE. The empirical results on
five downstream tasks confirm our conjecture, demonstrating that curriculum
learning can be successfully used to self-supervise masked autoencoders. We
release our code at https://github.com/ristea/cl-mae.
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization [42.82742477950748]
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
arXiv Detail & Related papers (2024-02-28T07:37:26Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - Exploring The Role of Mean Teachers in Self-supervised Masked
Auto-Encoders [64.03000385267339]
Masked image modeling (MIM) has become a popular strategy for self-supervised learning(SSL) of visual representations with Vision Transformers.
We present a simple SSL method, the Reconstruction-Consistent Masked Auto-Encoder (RC-MAE) by adding an EMA teacher to MAE.
RC-MAE converges faster and requires less memory usage than state-of-the-art self-distillation methods during pre-training.
arXiv Detail & Related papers (2022-10-05T08:08:55Z) - The Devil is in the Frequency: Geminated Gestalt Autoencoder for
Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training.
Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z) - Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed.
It simultaneously learns a masking function and an image encoder using an adversarial objective.
It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.