When Does Contrastive Visual Representation Learning Work?
- URL: http://arxiv.org/abs/2105.05837v1
- Date: Wed, 12 May 2021 17:52:42 GMT
- Title: When Does Contrastive Visual Representation Learning Work?
- Authors: Elijah Cole, Xuan Yang, Kimberly Wilber, Oisin Mac Aodha, Serge
Belongie
- Abstract summary: We study contrastive self-supervised learning on four diverse large-scale datasets.
Our key findings include: (i) the benefit of additional pretraining data beyond 500k images is modest, (ii) adding pretraining images from another domain does not lead to more general representations, and (iii) corrupted pretraining images have a disparate impact on supervised and self-supervised pretraining.
- Score: 13.247759411409936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent self-supervised representation learning techniques have largely closed
the gap between supervised and unsupervised learning on ImageNet
classification. While the particulars of pretraining on ImageNet are now
relatively well understood, the field still lacks widely accepted best
practices for replicating this success on other datasets. As a first step in
this direction, we study contrastive self-supervised learning on four diverse
large-scale datasets. By looking through the lenses of data quantity, data
domain, data quality, and task granularity, we provide new insights into the
necessary conditions for successful self-supervised learning. Our key findings
include observations such as: (i) the benefit of additional pretraining data
beyond 500k images is modest, (ii) adding pretraining images from another
domain does not lead to more general representations, (iii) corrupted
pretraining images have a disparate impact on supervised and self-supervised
pretraining, and (iv) contrastive learning lags far behind supervised learning
on fine-grained visual classification tasks.
Related papers
- Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - SLIP: Self-supervision meets Language-Image Pre-training [79.53764315471543]
We study whether self-supervised learning can aid in the use of language supervision for visual representation learning.
We introduce SLIP, a multi-task learning framework for combining self-supervised learning and CLIP pre-training.
We find that SLIP enjoys the best of both worlds: better performance than self-supervision and language supervision.
arXiv Detail & Related papers (2021-12-23T18:07:13Z) - Continual Contrastive Self-supervised Learning for Image Classification [10.070132585425938]
Self-supervise learning method shows tremendous potential on visual representation without any labeled data at scale.
To improve the visual representation of self-supervised learning, larger and more varied data is needed.
In this paper, we make the first attempt to implement the continual contrastive self-supervised learning by proposing a rehearsal method.
arXiv Detail & Related papers (2021-07-05T03:53:42Z) - Unsupervised Object-Level Representation Learning from Scene Images [97.07686358706397]
Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
arXiv Detail & Related papers (2021-06-22T17:51:24Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Few-Shot Learning with Part Discovery and Augmentation from Unlabeled
Images [79.34600869202373]
We show that inductive bias can be learned from a flat collection of unlabeled images, and instantiated as transferable representations among seen and unseen classes.
Specifically, we propose a novel part-based self-supervised representation learning scheme to learn transferable representations.
Our method yields impressive results, outperforming the previous best unsupervised methods by 7.74% and 9.24%.
arXiv Detail & Related papers (2021-05-25T12:22:11Z) - Curious Representation Learning for Embodied Intelligence [81.21764276106924]
Self-supervised representation learning has achieved remarkable success in recent years.
Yet to build truly intelligent agents, we must construct representation learning algorithms that can learn from environments.
We propose a framework, curious representation learning, which jointly learns a reinforcement learning policy and a visual representation model.
arXiv Detail & Related papers (2021-05-03T17:59:20Z) - Self-Supervised Learning of Remote Sensing Scene Representations Using
Contrastive Multiview Coding [0.0]
We conduct an analysis of the applicability of self-supervised learning in remote sensing image classification.
We show that, for the downstream task of remote sensing image classification, using self-supervised pre-training can give better results than using supervised pre-training on images of natural scenes.
arXiv Detail & Related papers (2021-04-14T18:25:43Z) - Self-Supervised Training Enhances Online Continual Learning [37.91734641808391]
In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting.
Self-supervised pre-training could yield features that generalize better than supervised learning.
Our best system achieves a 14.95% relative increase in top-1 accuracy on class incremental ImageNet over the prior state of the art for online continual learning.
arXiv Detail & Related papers (2021-03-25T17:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.