A Simple Framework for Contrastive Learning of Visual Representations
- URL: http://arxiv.org/abs/2002.05709v3
- Date: Wed, 1 Jul 2020 00:09:08 GMT
- Title: A Simple Framework for Contrastive Learning of Visual Representations
- Authors: Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton
- Abstract summary: This paper presents SimCLR: a simple framework for contrastive learning of visual representations.
We show that composition of data augmentations plays a critical role in defining effective predictive tasks.
We are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet.
- Score: 116.37752766922407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents SimCLR: a simple framework for contrastive learning of
visual representations. We simplify recently proposed contrastive
self-supervised learning algorithms without requiring specialized architectures
or a memory bank. In order to understand what enables the contrastive
prediction tasks to learn useful representations, we systematically study the
major components of our framework. We show that (1) composition of data
augmentations plays a critical role in defining effective predictive tasks, (2)
introducing a learnable nonlinear transformation between the representation and
the contrastive loss substantially improves the quality of the learned
representations, and (3) contrastive learning benefits from larger batch sizes
and more training steps compared to supervised learning. By combining these
findings, we are able to considerably outperform previous methods for
self-supervised and semi-supervised learning on ImageNet. A linear classifier
trained on self-supervised representations learned by SimCLR achieves 76.5%
top-1 accuracy, which is a 7% relative improvement over previous
state-of-the-art, matching the performance of a supervised ResNet-50. When
fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy,
outperforming AlexNet with 100X fewer labels.
Related papers
- Addressing Sample Inefficiency in Multi-View Representation Learning [6.621303125642322]
Non-contrastive self-supervised learning (NC-SSL) methods have shown great promise for label-free representation learning in computer vision.
We provide theoretical insights on the implicit bias of the BarlowTwins and VICReg loss that can explain theses and guide the development of more principled recommendations.
arXiv Detail & Related papers (2023-12-17T14:14:31Z) - Improving Visual Representation Learning through Perceptual
Understanding [0.0]
We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features.
We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks.
arXiv Detail & Related papers (2022-12-30T00:59:46Z) - Generalized Supervised Contrastive Learning [3.499054375230853]
We introduce a generalized supervised contrastive loss, which measures cross-entropy between label similarity and latent similarity.
Compared to existing contrastive learning frameworks, we construct a tailored framework: the Generalized Supervised Contrastive Learning (GenSCL)
GenSCL achieves a top-1 accuracy of 77.3% on ImageNet, a 4.1% improvement over traditional supervised contrastive learning.
arXiv Detail & Related papers (2022-06-01T10:38:21Z) - SLIP: Self-supervision meets Language-Image Pre-training [79.53764315471543]
We study whether self-supervised learning can aid in the use of language supervision for visual representation learning.
We introduce SLIP, a multi-task learning framework for combining self-supervised learning and CLIP pre-training.
We find that SLIP enjoys the best of both worlds: better performance than self-supervision and language supervision.
arXiv Detail & Related papers (2021-12-23T18:07:13Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z) - Big Self-Supervised Models are Strong Semi-Supervised Learners [116.00752519907725]
We show that it is surprisingly effective for semi-supervised learning on ImageNet.
A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning.
We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network.
arXiv Detail & Related papers (2020-06-17T17:48:22Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.