Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization
- URL: http://arxiv.org/abs/2402.18128v1
- Date: Wed, 28 Feb 2024 07:37:26 GMT
- Title: Downstream Task Guided Masking Learning in Masked Autoencoders Using
Multi-Level Optimization
- Authors: Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak
Roy Chowdhury, Rajesh K. Gupta, Pengtao Xie
- Abstract summary: Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning.
We introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that learns an optimal masking strategy during pretraining.
Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning.
- Score: 42.82742477950748
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked Autoencoder (MAE) is a notable method for self-supervised pretraining
in visual representation learning. It operates by randomly masking image
patches and reconstructing these masked patches using the unmasked ones. A key
limitation of MAE lies in its disregard for the varying informativeness of
different patches, as it uniformly selects patches to mask. To overcome this,
some approaches propose masking based on patch informativeness. However, these
methods often do not consider the specific requirements of downstream tasks,
potentially leading to suboptimal representations for these tasks. In response,
we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel
framework that leverages end-to-end feedback from downstream tasks to learn an
optimal masking strategy during pretraining. Our experimental findings
highlight MLO-MAE's significant advancements in visual representation learning.
Compared to existing methods, it demonstrates remarkable improvements across
diverse datasets and tasks, showcasing its adaptability and efficiency. Our
code is available at: https://github.com/Alexiland/MLOMAE
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
Masked Autoencoders [44.87786478095987]
Masked Autoencoders learn general representations for image, text, audio, video, etc., by masked input data from tokens of the visible data.
This paper proposes an adaptive masking strategy for MAEs that is end-to-end trainable.
AdaMAE samples visible tokens based on the semantic context using an auxiliary sampling network.
arXiv Detail & Related papers (2022-11-16T18:59:48Z) - What to Hide from Your Students: Attention-Guided Masked Image Modeling [32.402567373491834]
We argue that image token masking is fundamentally different from token masking in text.
We introduce a novel masking strategy, called attention-guided masking (AttMask)
arXiv Detail & Related papers (2022-03-23T20:52:50Z) - Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed.
It simultaneously learns a masking function and an image encoder using an adversarial objective.
It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z) - KSM: Fast Multiple Task Adaption via Kernel-wise Soft Mask Learning [49.77278179376902]
Deep Neural Networks (DNN) could forget the knowledge about earlier tasks when learning new tasks, and this is known as textitcatastrophic forgetting.
Recent continual learning methods are capable of alleviating the catastrophic problem on toy-sized datasets.
We propose a new training method called textit- Kernel-wise Soft Mask (KSM), which learns a kernel-wise hybrid binary and real-value soft mask for each task.
arXiv Detail & Related papers (2020-09-11T21:48:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.