Context Autoencoder for Self-Supervised Representation Learning
- URL: http://arxiv.org/abs/2202.03026v3
- Date: Thu, 10 Aug 2023 11:01:14 GMT
- Title: Context Autoencoder for Self-Supervised Representation Learning
- Authors: Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao
Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang
- Abstract summary: We pretrain an encoder by making predictions in the encoded representation space.
The network is an encoder-regressor-decoder architecture.
We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks.
- Score: 64.63908944426224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel masked image modeling (MIM) approach, context autoencoder
(CAE), for self-supervised representation pretraining. We pretrain an encoder
by making predictions in the encoded representation space. The pretraining
tasks include two tasks: masked representation prediction - predict the
representations for the masked patches, and masked patch reconstruction -
reconstruct the masked patches. The network is an encoder-regressor-decoder
architecture: the encoder takes the visible patches as input; the regressor
predicts the representations of the masked patches, which are expected to be
aligned with the representations computed from the encoder, using the
representations of visible patches and the positions of visible and masked
patches; the decoder reconstructs the masked patches from the predicted encoded
representations. The CAE design encourages the separation of learning the
encoder (representation) from completing the pertaining tasks: masked
representation prediction and masked patch reconstruction tasks, and making
predictions in the encoded representation space empirically shows the benefit
to representation learning. We demonstrate the effectiveness of our CAE through
superior transfer performance in downstream tasks: semantic segmentation,
object detection and instance segmentation, and classification. The code will
be available at https://github.com/Atten4Vis/CAE.
Related papers
- Rethinking Patch Dependence for Masked Autoencoders [92.37365660775171]
We re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE)
We propose a novel pretraining framework: Cross-Attention Masked Autoencoders (CrossMAE)
arXiv Detail & Related papers (2024-01-25T18:49:57Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - Siamese Masked Autoencoders [76.35448665609998]
We present Siamese Masked Autoencoders (SiamMAE) for learning visual correspondence from videos.
SiamMAE operates on pairs of randomly sampled video frames and asymmetrically masks them.
It outperforms state-of-the-art self-supervised methods on video object segmentation, pose keypoint propagation, and semantic part propagation tasks.
arXiv Detail & Related papers (2023-05-23T17:59:46Z) - SdAE: Self-distillated Masked Autoencoder [95.3684955370897]
Self-distillated masked AutoEncoder network SdAE is proposed in this paper.
With only 300 epochs pre-training, a vanilla ViT-Base model achieves an 84.1% fine-tuning accuracy on ImageNet-1k classification.
arXiv Detail & Related papers (2022-07-31T15:07:25Z) - Masked Autoencoders that Listen [79.99280830830854]
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms.
Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram.
arXiv Detail & Related papers (2022-07-13T17:59:55Z) - Improvements to Self-Supervised Representation Learning for Masked Image
Modeling [0.0]
This paper explores improvements to the masked image modeling (MIM) paradigm.
The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part.
We propose a new model, Contrastive Masked AutoEncoders (CMAE)
arXiv Detail & Related papers (2022-05-21T09:45:50Z) - SeMask: Semantically Masked Transformers for Semantic Segmentation [10.15763397352378]
SeMask is a framework that incorporates semantic information into the encoder with the help of a semantic attention operation.
Our framework achieves a new state-of-the-art of 58.22% mIoU on the ADE20K dataset and improvements of over 3% in the mIoU metric on the Cityscapes dataset.
arXiv Detail & Related papers (2021-12-23T18:56:02Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z) - OLED: One-Class Learned Encoder-Decoder Network with Adversarial Context
Masking for Novelty Detection [1.933681537640272]
novelty detection is the task of recognizing samples that do not belong to the distribution of the target class.
Deep autoencoders have been widely used as a base of many unsupervised novelty detection methods.
We have designed a framework consisting of two competing networks, a Mask Module and a Reconstructor.
arXiv Detail & Related papers (2021-03-27T17:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.