A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning
- URL: http://arxiv.org/abs/2106.14749v1
- Date: Mon, 28 Jun 2021 14:24:52 GMT
- Title: A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning
- Authors: Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi
- Abstract summary: Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives.
In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination.
Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
- Score: 111.05365744744437
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For an image query, unsupervised contrastive learning labels crops of the
same image as positives, and other image crops as negatives. Although
intuitive, such a native label assignment strategy cannot reveal the underlying
semantic similarity between a query and its positives and negatives, and
impairs performance, since some negatives are semantically similar to the query
or even share the same semantic class as the query. In this work, we first
prove that for contrastive learning, inaccurate label assignment heavily
impairs its generalization for semantic instance discrimination, while accurate
labels benefit its generalization. Inspired by this theory, we propose a novel
self-labeling refinement approach for contrastive learning. It improves the
label quality via two complementary modules: (i) self-labeling refinery (SLR)
to generate accurate labels and (ii) momentum mixup (MM) to enhance similarity
between query and its positive. SLR uses a positive of a query to estimate
semantic similarity between a query and its positive and negatives, and
combines estimated similarity with vanilla label assignment in contrastive
learning to iteratively generate more accurate and informative soft labels. We
theoretically show that our SLR can exactly recover the true semantic labels of
label-corrupted data, and supervises networks to achieve zero prediction error
on classification tasks. MM randomly combines queries and positives to increase
semantic similarity between the generated virtual queries and their positives
so as to improves label accuracy. Experimental results on CIFAR10, ImageNet,
VOC and COCO show the effectiveness of our method. PyTorch code and model will
be released online.
Related papers
- GaussianMLR: Learning Implicit Class Significance via Calibrated
Multi-Label Ranking [0.0]
We propose a novel multi-label ranking method: GaussianMLR.
It aims to learn implicit class significance values that determine the positive label ranks.
We show that our method is able to accurately learn a representation of the incorporated positive rank order.
arXiv Detail & Related papers (2023-03-07T14:09:08Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Category-Adaptive Label Discovery and Noise Rejection for Multi-label
Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention.
Previous works regard unknown labels as negative and adopt traditional MLR algorithms.
We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z) - Similarity Contrastive Estimation for Self-Supervised Soft Contrastive
Learning [0.41998444721319206]
We argue that a good data representation contains the relations, or semantic similarity, between the instances.
We propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE)
Our training objective can be considered as soft contrastive learning.
arXiv Detail & Related papers (2021-11-29T15:19:15Z) - Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced
Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance.
We propose a general pseudo-labeling framework to address the bias motivated by this observation.
We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - CO2: Consistent Contrast for Unsupervised Visual Representation Learning [15.18275537384316]
We propose Consistent Contrast (CO2), which introduces a consistency regularization term into the current contrastive learning framework.
Regarding the similarity of the query crop to each crop from other images as "unlabeled", the consistency term takes the corresponding similarity of a positive crop as a pseudo label, and encourages consistency between these two similarities.
Empirically, CO2 improves Momentum Contrast (MoCo) by 2.9% top-1 accuracy on ImageNet linear protocol, 3.8% and 1.1% top-5 accuracy on 1% and 10% labeled semi-supervised settings.
arXiv Detail & Related papers (2020-10-05T18:00:01Z) - Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints.
Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.