Masked Autoencoders are Robust Data Augmentors
- URL: http://arxiv.org/abs/2206.04846v1
- Date: Fri, 10 Jun 2022 02:41:48 GMT
- Title: Masked Autoencoders are Robust Data Augmentors
- Authors: Haohang Xu and Shuangrui Ding and Xiaopeng Zhang and Hongkai Xiong and
Qi Tian
- Abstract summary: Regularization techniques like image augmentation are necessary for deep neural networks to generalize well.
We propose a novel perspective of augmentation to regularize the training process.
We show that utilizing such model-based nonlinear transformation as data augmentation can improve high-level recognition tasks.
- Score: 90.34825840657774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks are capable of learning powerful representations to
tackle complex vision tasks but expose undesirable properties like the
over-fitting issue. To this end, regularization techniques like image
augmentation are necessary for deep neural networks to generalize well.
Nevertheless, most prevalent image augmentation recipes confine themselves to
off-the-shelf linear transformations like scale, flip, and colorjitter. Due to
their hand-crafted property, these augmentations are insufficient to generate
truly hard augmented examples. In this paper, we propose a novel perspective of
augmentation to regularize the training process. Inspired by the recent success
of applying masked image modeling to self-supervised learning, we adopt the
self-supervised masked autoencoder to generate the distorted view of the input
images. We show that utilizing such model-based nonlinear transformation as
data augmentation can improve high-level recognition tasks. We term the
proposed method as \textbf{M}ask-\textbf{R}econstruct \textbf{A}ugmentation
(MRA). The extensive experiments on various image classification benchmarks
verify the effectiveness of the proposed augmentation. Specifically, MRA
consistently enhances the performance on supervised, semi-supervised as well as
few-shot classification. The code will be available at
\url{https://github.com/haohang96/MRA}.
Related papers
- HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - CL-MAE: Curriculum-Learned Masked Autoencoders [49.24994655813455]
We propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task.
We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE.
arXiv Detail & Related papers (2023-08-31T09:13:30Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - MAGE: MAsked Generative Encoder to Unify Representation Learning and
Image Synthesis [33.46831766206675]
MAsked Generative (MAGE) is first framework to unify SOTA image generation and self-supervised representation learning.
Inspired by previous generative models, MAGE uses semantic tokens learned by a vector-quantized GAN at inputs and outputs.
On ImageNet-1K, a single MAGE ViT-L model obtains 9.10 FID in the task of class-unconditional image generation.
arXiv Detail & Related papers (2022-11-16T18:59:02Z) - Stare at What You See: Masked Image Modeling without Reconstruction [154.74533119863864]
Masked Autoencoders (MAE) have been prevailing paradigms for large-scale vision representation pre-training.
Recent approaches apply semantic-rich teacher models to extract image features as the reconstruction target, leading to better performance.
We argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image.
arXiv Detail & Related papers (2022-11-16T12:48:52Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - InAugment: Improving Classifiers via Internal Augmentation [14.281619356571724]
We present a novel augmentation operation, that exploits image internal statistics.
We show improvement over state-of-the-art augmentation techniques.
We also demonstrate an increase for ResNet50 and EfficientNet-B3 top-1's accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-04-08T15:37:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.