Elucidating and Overcoming the Challenges of Label Noise in Supervised
Contrastive Learning
- URL: http://arxiv.org/abs/2311.16481v1
- Date: Sat, 25 Nov 2023 10:04:42 GMT
- Title: Elucidating and Overcoming the Challenges of Label Noise in Supervised
Contrastive Learning
- Authors: Zijun Long, George Killick, Lipeng Zhuang, Richard McCreadie, Gerardo
Aragon Camarasa, Paul Henderson
- Abstract summary: We propose a novel Debiased Supervised Contrastive Learning objective designed to mitigate the bias introduced by labeling errors.
We demonstrate that D-SCL consistently outperforms state-of-the-art techniques for representation learning across diverse vision benchmarks.
- Score: 7.439049772394586
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image classification datasets exhibit a non-negligible fraction of mislabeled
examples, often due to human error when one class superficially resembles
another. This issue poses challenges in supervised contrastive learning (SCL),
where the goal is to cluster together data points of the same class in the
embedding space while distancing those of disparate classes. While such methods
outperform those based on cross-entropy, they are not immune to labeling
errors. However, while the detrimental effects of noisy labels in supervised
learning are well-researched, their influence on SCL remains largely
unexplored. Hence, we analyse the effect of label errors and examine how they
disrupt the SCL algorithm's ability to distinguish between positive and
negative sample pairs. Our analysis reveals that human labeling errors manifest
as easy positive samples in around 99% of cases. We, therefore, propose D-SCL,
a novel Debiased Supervised Contrastive Learning objective designed to mitigate
the bias introduced by labeling errors. We demonstrate that D-SCL consistently
outperforms state-of-the-art techniques for representation learning across
diverse vision benchmarks, offering improved robustness to label errors.
Related papers
- Understanding and Mitigating Human-Labelling Errors in Supervised
Contrastive Learning [7.439049772394586]
We show that human-labelling errors pose unique challenges in Supervised Contrastive Learning (SCL)
Our results indicate they adversely impact the learning process in the 99% of cases when they occur as false positive samples.
Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates.
We introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE.
arXiv Detail & Related papers (2024-03-10T19:05:12Z) - CLAF: Contrastive Learning with Augmented Features for Imbalanced
Semi-Supervised Learning [40.5117833362268]
Semi-supervised learning and contrastive learning have been progressively combined to achieve better performances in popular applications.
One common manner is assigning pseudo-labels to unlabeled samples and selecting positive and negative samples from pseudo-labeled samples to apply contrastive learning.
We propose Contrastive Learning with Augmented Features (CLAF) to alleviate the scarcity of minority class samples in contrastive learning.
arXiv Detail & Related papers (2023-12-15T08:27:52Z) - Virtual Category Learning: A Semi-Supervised Learning Method for Dense
Prediction with Extremely Limited Labels [63.16824565919966]
This paper proposes to use confusing samples proactively without label correction.
A Virtual Category (VC) is assigned to each confusing sample in such a way that it can safely contribute to the model optimisation.
Our intriguing findings highlight the usage of VC learning in dense vision tasks.
arXiv Detail & Related papers (2023-12-02T16:23:52Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Complementary Labels Learning with Augmented Classes [22.460256396941528]
Complementary Labels Learning (CLL) arises in many real-world tasks such as private questions classification and online learning.
We propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC)
By using unlabeled data, we propose an unbiased estimator of classification risk for CLLAC, which is guaranteed to be provably consistent.
arXiv Detail & Related papers (2022-11-19T13:55:27Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Adversarial Contrastive Learning via Asymmetric InfoNCE [64.42740292752069]
We propose to treat adversarial samples unequally when contrasted with an asymmetric InfoNCE objective.
In the asymmetric fashion, the adverse impacts of conflicting objectives between CL and adversarial learning can be effectively mitigated.
Experiments show that our approach consistently outperforms existing Adversarial CL methods.
arXiv Detail & Related papers (2022-07-18T04:14:36Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Negative Selection by Clustering for Contrastive Learning in Human
Activity Recognition [5.351176836203563]
We propose a new contrastive learning framework that negative selection by clustering in Human Activity Recognition (HAR)
Compared with SimCLR, it redefines the negative pairs in the contrastive loss function by using unsupervised clustering methods to generate soft labels that mask other samples of the same cluster to avoid regarding them as negative samples.
We evaluate ClusterCLHAR on three benchmark datasets, USC-HAD, MotionSense, and UCI-HAR, using mean F1-score as the evaluation metric.
arXiv Detail & Related papers (2022-03-23T06:54:16Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.