Improvements to Self-Supervised Representation Learning for Masked Image
Modeling
- URL: http://arxiv.org/abs/2205.10546v1
- Date: Sat, 21 May 2022 09:45:50 GMT
- Title: Improvements to Self-Supervised Representation Learning for Masked Image
Modeling
- Authors: Jiawei Mao, Xuesong Yin, Yuanqi Chang, Honggu Zhou
- Abstract summary: This paper explores improvements to the masked image modeling (MIM) paradigm.
The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part.
We propose a new model, Contrastive Masked AutoEncoders (CMAE)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores improvements to the masked image modeling (MIM) paradigm.
The MIM paradigm enables the model to learn the main object features of the
image by masking the input image and predicting the masked part by the unmasked
part. We found the following three main directions for MIM to be improved.
First, since both encoders and decoders contribute to representation learning,
MIM uses only encoders for downstream tasks, which ignores the impact of
decoders on representation learning. Although the MIM paradigm already employs
small decoders with asymmetric structures, we believe that continued reduction
of decoder parameters is beneficial to improve the representational learning
capability of the encoder . Second, MIM solves the image prediction task by
training the encoder and decoder together , and does not design a separate task
for the encoder . To further enhance the performance of the encoder when
performing downstream tasks, we designed the encoder for the tasks of
comparative learning and token position prediction. Third, since the input
image may contain background and other objects, and the proportion of each
object in the image varies, reconstructing the tokens related to the background
or to other objects is not meaningful for MIM to understand the main object
representations. Therefore we use ContrastiveCrop to crop the input image so
that the input image contains as much as possible only the main objects. Based
on the above three improvements to MIM, we propose a new model, Contrastive
Masked AutoEncoders (CMAE). We achieved a Top-1 accuracy of 65.84% on
tinyimagenet using the ViT-B backbone, which is +2.89 outperforming the MAE of
competing methods when all conditions are equal. Code will be made available.
Related papers
- Membership Inference Attack Against Masked Image Modeling [29.699606401861818]
Masked Image Modeling (MIM) has achieved significant success in the realm of self-supervised learning (SSL) for visual recognition.
In this work, we take a different angle by studying the pre-training data privacy of MIM.
We propose the first membership inference attack against image encoders pre-trained by MIM.
arXiv Detail & Related papers (2024-08-13T11:34:28Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling [83.67628239775878]
Masked Image Modeling (MIM) has achieved promising progress with the advent of Masked Autoencoders (MAE) and BEiT.
This paper undertakes a fundamental analysis of MIM from the perspective of pixel reconstruction.
We propose a remarkably simple and effective method, ourmethod, that entails two strategies.
arXiv Detail & Related papers (2023-03-04T13:38:51Z) - Beyond Masking: Demystifying Token-Based Pre-Training for Vision
Transformers [122.01591448013977]
Masked image modeling (MIM) has demonstrated promising results on downstream tasks.
In this paper, we investigate whether there exist other effective ways to learn by recovering missing contents'
We summarize a few design principles for token-based pre-training of vision transformers.
This design achieves superior performance over MIM in a series of downstream recognition tasks without extra computational cost.
arXiv Detail & Related papers (2022-03-27T14:23:29Z) - Context Autoencoder for Self-Supervised Representation Learning [64.63908944426224]
We pretrain an encoder by making predictions in the encoded representation space.
The network is an encoder-regressor-decoder architecture.
We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks.
arXiv Detail & Related papers (2022-02-07T09:33:45Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.