How Mask Matters: Towards Theoretical Understandings of Masked
Autoencoders
- URL: http://arxiv.org/abs/2210.08344v2
- Date: Sun, 26 Mar 2023 06:47:53 GMT
- Title: How Mask Matters: Towards Theoretical Understandings of Masked
Autoencoders
- Authors: Qi Zhang, Yifei Wang, Yisen Wang
- Abstract summary: Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL)
We propose a theoretical understanding of how masking matters for MAE to learn meaningful features.
- Score: 21.849681446573257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked Autoencoders (MAE) based on a reconstruction task have risen to be a
promising paradigm for self-supervised learning (SSL) and achieve
state-of-the-art performance across different benchmark datasets. However,
despite its impressive empirical success, there is still limited theoretical
understanding of it. In this paper, we propose a theoretical understanding of
how masking matters for MAE to learn meaningful features. We establish a close
connection between MAE and contrastive learning, which shows that MAE implicit
aligns the mask-induced positive pairs. Built upon this connection, we develop
the first downstream guarantees for MAE methods, and analyze the effect of mask
ratio. Besides, as a result of the implicit alignment, we also point out the
dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE
(U-MAE) loss that can effectively address this issue and bring significant
improvements on real-world datasets, including CIFAR-10, ImageNet-100, and
ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE).
Related papers
- ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders [53.3185750528969]
Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework.
We introduce a data-independent method, termed ColorMAE, which generates different binary mask patterns by filtering random noise.
We demonstrate our strategy's superiority in downstream tasks compared to random masking.
arXiv Detail & Related papers (2024-07-17T22:04:00Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Understanding Masked Autoencoders From a Local Contrastive Perspective [80.57196495601826]
Masked AutoEncoder (MAE) has revolutionized the field of self-supervised learning with its simple yet effective masking and reconstruction strategies.
We introduce a new empirical framework, called Local Contrastive MAE, to analyze both reconstructive and contrastive aspects of MAE.
arXiv Detail & Related papers (2023-10-03T12:08:15Z) - Understanding Masked Autoencoders via Hierarchical Latent Variable
Models [109.35382136147349]
Masked autoencoder (MAE) has recently achieved prominent success in a variety of vision tasks.
Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking.
arXiv Detail & Related papers (2023-06-08T03:00:10Z) - Mixed Autoencoder for Self-supervised Visual Representation Learning [95.98114940999653]
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction.
This paper studies the prevailing mixing augmentation for MAE.
arXiv Detail & Related papers (2023-03-30T05:19:43Z) - Efficient Masked Autoencoders with Self-Consistency [34.7076436760695]
Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision.
We propose efficient masked autoencoders with self-consistency (EMAE) to improve the pre-training efficiency.
EMAE consistently obtains state-of-the-art transfer ability on a variety of downstream tasks, such as image classification, object detection, and semantic segmentation.
arXiv Detail & Related papers (2023-02-28T09:21:12Z) - GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds [72.60362979456035]
Masked Autoencoders (MAE) are challenging to explore in large-scale 3D point clouds.
We propose a textbfGenerative textbfDecoder for MAE (GD-MAE) to automatically merges the surrounding context.
We demonstrate the efficacy of the proposed method on several large-scale benchmarks: KITTI, and ONCE.
arXiv Detail & Related papers (2022-12-06T14:32:55Z) - How to Understand Masked Autoencoders [15.775716869623992]
We propose a unified theoretical framework that provides a mathematical understanding for Masked Autoencoders (MAE)
Specifically, we explain the patch-based attention approaches of MAE using an integral kernel under a non-overlapping domain decomposition setting.
To help the research community to further comprehend the main reasons of the great success of MAE, based on our framework, we pose five questions and answer them with mathematical rigor using insights from operator theory.
arXiv Detail & Related papers (2022-02-08T06:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.