Masked Contrastive Representation Learning
- URL: http://arxiv.org/abs/2211.06012v1
- Date: Fri, 11 Nov 2022 05:32:28 GMT
- Title: Masked Contrastive Representation Learning
- Authors: Yuchong Yao, Nandakishor Desai, Marimuthu Palaniswami
- Abstract summary: This work presents Masked Contrastive Representation Learning (MACRL) for self-supervised visual pre-training.
We adopt an asymmetric setting for the siamese network (i.e., encoder-decoder structure in both branches), where one branch with higher mask ratio and stronger data augmentation, while the other adopts weaker data corruptions.
In our experiments, MACRL presents superior results on various vision benchmarks, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and two other ImageNet subsets.
- Score: 6.737710830712818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked image modelling (e.g., Masked AutoEncoder) and contrastive learning
(e.g., Momentum Contrast) have shown impressive performance on unsupervised
visual representation learning. This work presents Masked Contrastive
Representation Learning (MACRL) for self-supervised visual pre-training. In
particular, MACRL leverages the effectiveness of both masked image modelling
and contrastive learning. We adopt an asymmetric setting for the siamese
network (i.e., encoder-decoder structure in both branches), where one branch
with higher mask ratio and stronger data augmentation, while the other adopts
weaker data corruptions. We optimize a contrastive learning objective based on
the learned features from the encoder in both branches. Furthermore, we
minimize the $L_1$ reconstruction loss according to the decoders' outputs. In
our experiments, MACRL presents superior results on various vision benchmarks,
including CIFAR-10, CIFAR-100, Tiny-ImageNet, and two other ImageNet subsets.
Our framework provides unified insights on self-supervised visual pre-training
and future research.
Related papers
- Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Masked Autoencoders are Efficient Class Incremental Learners [64.90846899051164]
Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge.
We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL.
arXiv Detail & Related papers (2023-08-24T02:49:30Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - MAGE: MAsked Generative Encoder to Unify Representation Learning and
Image Synthesis [33.46831766206675]
MAsked Generative (MAGE) is first framework to unify SOTA image generation and self-supervised representation learning.
Inspired by previous generative models, MAGE uses semantic tokens learned by a vector-quantized GAN at inputs and outputs.
On ImageNet-1K, a single MAGE ViT-L model obtains 9.10 FID in the task of class-unconditional image generation.
arXiv Detail & Related papers (2022-11-16T18:59:02Z) - Contrastive Masked Autoencoders are Stronger Vision Learners [114.16568579208216]
Contrastive Masked Autoencoders (CMAE) is a new self-supervised pre-training method for learning more comprehensive and capable vision representations.
CMAE achieves the state-of-the-art performance on highly competitive benchmarks of image classification, semantic segmentation and object detection.
arXiv Detail & Related papers (2022-07-27T14:04:22Z) - The Devil is in the Frequency: Geminated Gestalt Autoencoder for
Self-Supervised Visual Pre-Training [13.087987450384036]
We present a new Masked Image Modeling (MIM), termed Geminated Autoencoder (Ge$2$-AE) for visual pre-training.
Specifically, we equip our model with geminated decoders in charge of reconstructing image contents from both pixel and frequency space.
arXiv Detail & Related papers (2022-04-18T09:22:55Z) - Adversarial Masking for Self-Supervised Learning [81.25999058340997]
Masked image model (MIM) framework for self-supervised learning, ADIOS, is proposed.
It simultaneously learns a masking function and an image encoder using an adversarial objective.
It consistently improves on state-of-the-art self-supervised learning (SSL) methods on a variety of tasks and datasets.
arXiv Detail & Related papers (2022-01-31T10:23:23Z) - Masked Autoencoders Are Scalable Vision Learners [60.97703494764904]
Masked autoencoders (MAE) are scalable self-supervised learners for computer vision.
Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels.
Coupling these two designs enables us to train large models efficiently and effectively.
arXiv Detail & Related papers (2021-11-11T18:46:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.