Understanding and Mitigating Human-Labelling Errors in Supervised
Contrastive Learning
- URL: http://arxiv.org/abs/2403.06289v1
- Date: Sun, 10 Mar 2024 19:05:12 GMT
- Title: Understanding and Mitigating Human-Labelling Errors in Supervised
Contrastive Learning
- Authors: Zijun Long and Lipeng Zhuang and George Killick and Richard McCreadie
and Gerardo Aragon Camarasa and Paul Henderson
- Abstract summary: We show that human-labelling errors pose unique challenges in Supervised Contrastive Learning (SCL)
Our results indicate they adversely impact the learning process in the 99% of cases when they occur as false positive samples.
Existing noise-mitigating methods primarily focus on synthetic label errors and tackle the unrealistic setting of very high synthetic noise rates.
We introduce a novel SCL objective with robustness to human-labelling errors, SCL-RHE.
- Score: 7.439049772394586
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Human-annotated vision datasets inevitably contain a fraction of human
mislabelled examples. While the detrimental effects of such mislabelling on
supervised learning are well-researched, their influence on Supervised
Contrastive Learning (SCL) remains largely unexplored. In this paper, we show
that human-labelling errors not only differ significantly from synthetic label
errors, but also pose unique challenges in SCL, different to those in
traditional supervised learning methods. Specifically, our results indicate
they adversely impact the learning process in the ~99% of cases when they occur
as false positive samples. Existing noise-mitigating methods primarily focus on
synthetic label errors and tackle the unrealistic setting of very high
synthetic noise rates (40-80%), but they often underperform on common image
datasets due to overfitting. To address this issue, we introduce a novel SCL
objective with robustness to human-labelling errors, SCL-RHE. SCL-RHE is
designed to mitigate the effects of real-world mislabelled examples, typically
characterized by much lower noise rates (<5%). We demonstrate that SCL-RHE
consistently outperforms state-of-the-art representation learning and
noise-mitigating methods across various vision benchmarks, by offering improved
resilience against human-labelling errors.
Related papers
- Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement [37.4838454216137]
Few-Shot Human-in-the-Loop Refinement (FHLR) is a novel solution to address noisy label learning.
We show that FHLR achieves significantly better performance when learning from noisy labels.
Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models.
arXiv Detail & Related papers (2024-01-25T11:43:35Z) - Analyze the Robustness of Classifiers under Label Noise [5.708964539699851]
Label noise in supervised learning, characterized by erroneous or imprecise labels, significantly impairs model performance.
This research focuses on the increasingly pertinent issue of label noise's impact on practical applications.
arXiv Detail & Related papers (2023-12-12T13:51:25Z) - Elucidating and Overcoming the Challenges of Label Noise in Supervised
Contrastive Learning [7.439049772394586]
We propose a novel Debiased Supervised Contrastive Learning objective designed to mitigate the bias introduced by labeling errors.
We demonstrate that D-SCL consistently outperforms state-of-the-art techniques for representation learning across diverse vision benchmarks.
arXiv Detail & Related papers (2023-11-25T10:04:42Z) - Uncertainty-guided Boundary Learning for Imbalanced Social Event
Detection [64.4350027428928]
We propose a novel uncertainty-guided class imbalance learning framework for imbalanced social event detection tasks.
Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.
arXiv Detail & Related papers (2023-10-30T03:32:04Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Channel-Wise Contrastive Learning for Learning with Noisy Labels [60.46434734808148]
We introduce channel-wise contrastive learning (CWCL) to distinguish authentic label information from noise.
Unlike conventional instance-wise contrastive learning (IWCL), CWCL tends to yield more nuanced and resilient features aligned with the authentic labels.
Our strategy is twofold: firstly, using CWCL to extract pertinent features to identify cleanly labeled samples, and secondly, progressively fine-tuning using these samples.
arXiv Detail & Related papers (2023-08-14T06:04:50Z) - Investigating the Learning Behaviour of In-context Learning: A
Comparison with Supervised Learning [67.25698169440818]
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL)
We train the same LLMs with the same demonstration examples via ICL and supervised learning (SL), respectively, and investigate their performance under label perturbations.
First, we find that gold labels have significant impacts on the downstream in-context performance, especially for large language models.
Second, when comparing with SL, we show empirically that ICL is less sensitive to label perturbations than SL, and ICL gradually attains comparable performance to SL as the model size increases.
arXiv Detail & Related papers (2023-07-28T09:03:19Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Contrastive Learning Improves Model Robustness Under Label Noise [3.756550107432323]
We show that by initializing supervised robust methods using representations learned through contrastive learning leads to significantly improved performance under label noise.
Even the simplest method can outperform the state-of-the-art SSL method by more than 50% under high label noise when with contrastive learning.
arXiv Detail & Related papers (2021-04-19T00:27:58Z) - Which Strategies Matter for Noisy Label Classification? Insight into
Loss and Uncertainty [7.20844895799647]
Label noise is a critical factor that degrades the generalization performance of deep neural networks.
We present analytical results on how loss and uncertainty values of samples change throughout the training process.
We design a new robust training method that emphasizes clean and informative samples, while minimizing the influence of noise.
arXiv Detail & Related papers (2020-08-14T07:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.