Divide and Contrast: Self-supervised Learning from Uncurated Data
- URL: http://arxiv.org/abs/2105.08054v1
- Date: Mon, 17 May 2021 17:59:03 GMT
- Title: Divide and Contrast: Self-supervised Learning from Uncurated Data
- Authors: Yonglong Tian, Olivier J. Henaff, Aaron van den Oord
- Abstract summary: Divide and Contrast (DnC) alternates between contrastive learning and clustering-based hard negative mining.
When pretrained on less curated datasets, DnC greatly improves the performance of self-supervised learning on downstream tasks.
- Score: 10.783599686099716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning holds promise in leveraging large amounts of
unlabeled data, however much of its progress has thus far been limited to
highly curated pre-training data such as ImageNet. We explore the effects of
contrastive learning from larger, less-curated image datasets such as YFCC, and
find there is indeed a large difference in the resulting representation
quality. We hypothesize that this curation gap is due to a shift in the
distribution of image classes -- which is more diverse and heavy-tailed --
resulting in less relevant negative samples to learn from. We test this
hypothesis with a new approach, Divide and Contrast (DnC), which alternates
between contrastive learning and clustering-based hard negative mining. When
pretrained on less curated datasets, DnC greatly improves the performance of
self-supervised learning on downstream tasks, while remaining competitive with
the current state-of-the-art on curated datasets.
Related papers
- InfoNCE Loss Provably Learns Cluster-Preserving Representations [54.28112623495274]
We show that the representation learned by InfoNCE with a finite number of negative samples is consistent with respect to clusters in the data.
Our main result is to show that the representation learned by InfoNCE with a finite number of negative samples is also consistent with respect to clusters in the data.
arXiv Detail & Related papers (2023-02-15T19:45:35Z) - Joint Debiased Representation and Image Clustering Learning with
Self-Supervision [3.1806743741013657]
We develop a novel joint clustering and contrastive learning framework.
We adapt the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets.
arXiv Detail & Related papers (2022-09-14T21:23:41Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Positional Contrastive Learning for Volumetric Medical Image
Segmentation [13.086140606803408]
We propose a novel positional contrastive learning framework to generate contrastive data pairs.
The proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.
arXiv Detail & Related papers (2021-06-16T22:15:28Z) - Incremental False Negative Detection for Contrastive Learning [95.68120675114878]
We introduce a novel incremental false negative detection for self-supervised contrastive learning.
During contrastive learning, we discuss two strategies to explicitly remove the detected false negatives.
Our proposed method outperforms other self-supervised contrastive learning frameworks on multiple benchmarks within a limited compute.
arXiv Detail & Related papers (2021-06-07T15:29:14Z) - Solving Inefficiency of Self-supervised Representation Learning [87.30876679780532]
Existing contrastive learning methods suffer from very low learning efficiency.
Under-clustering and over-clustering problems are major obstacles to learning efficiency.
We propose a novel self-supervised learning framework using a median triplet loss.
arXiv Detail & Related papers (2021-04-18T07:47:10Z) - Contrastive Learning based Hybrid Networks for Long-Tailed Image
Classification [31.647639786095993]
We propose a novel hybrid network structure composed of a supervised contrastive loss to learn image representations and a cross-entropy loss to learn classifiers.
Experiments on three long-tailed classification datasets demonstrate the advantage of the proposed contrastive learning based hybrid networks in long-tailed classification.
arXiv Detail & Related papers (2021-03-26T05:22:36Z) - Negative Data Augmentation [127.28042046152954]
We show that negative data augmentation samples provide information on the support of the data distribution.
We introduce a new GAN training objective where we use NDA as an additional source of synthetic data for the discriminator.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
arXiv Detail & Related papers (2021-02-09T20:28:35Z) - Hard Negative Mixing for Contrastive Learning [29.91220669060252]
We argue that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected.
We propose hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
arXiv Detail & Related papers (2020-10-02T14:34:58Z) - Delving into Inter-Image Invariance for Unsupervised Visual
Representations [108.33534231219464]
We present a study to better understand the role of inter-image invariance learning.
Online labels converge faster than offline labels.
Semi-hard negative samples are more reliable and unbiased than hard negative samples.
arXiv Detail & Related papers (2020-08-26T17:44:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.