Mitigating Label Noise through Data Ambiguation
- URL: http://arxiv.org/abs/2305.13764v2
- Date: Thu, 25 Jan 2024 17:39:19 GMT
- Title: Mitigating Label Noise through Data Ambiguation
- Authors: Julian Lienen, Eyke H\"ullermeier
- Abstract summary: Large models with high expressive power are prone to memorizing incorrect labels, thereby harming generalization performance.
In this paper, we suggest to address the shortcomings of both methodologies by "ambiguating" the target information.
More precisely, we leverage the framework of so-called superset learning to construct set-valued targets based on a confidence threshold.
- Score: 9.51828574518325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Label noise poses an important challenge in machine learning, especially in
deep learning, in which large models with high expressive power dominate the
field. Models of that kind are prone to memorizing incorrect labels, thereby
harming generalization performance. Many methods have been proposed to address
this problem, including robust loss functions and more complex label correction
approaches. Robust loss functions are appealing due to their simplicity, but
typically lack flexibility, while label correction usually adds substantial
complexity to the training setup. In this paper, we suggest to address the
shortcomings of both methodologies by "ambiguating" the target information,
adding additional, complementary candidate labels in case the learner is not
sufficiently convinced of the observed training label. More precisely, we
leverage the framework of so-called superset learning to construct set-valued
targets based on a confidence threshold, which deliver imprecise yet more
reliable beliefs about the ground-truth, effectively helping the learner to
suppress the memorization effect. In an extensive empirical evaluation, our
method demonstrates favorable learning behavior on synthetic and real-world
noise, confirming the effectiveness in detecting and correcting erroneous
training labels.
Related papers
- Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement [5.865750284677784]
Adversarial training (AT) is one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks.
AT methods suffer from robust overfitting, i.e., a significant generalization gap between the training and testing curves.
We propose a label refinement approach for AT, which self-refines a more accurate and informative label distribution from over-confident hard labels.
arXiv Detail & Related papers (2024-03-14T04:48:31Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Robust Data Pruning under Label Noise via Maximizing Re-labeling
Accuracy [34.02350195269502]
We formalize the problem of data pruning with re-labeling.
We propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples.
arXiv Detail & Related papers (2023-11-02T05:40:26Z) - Robust Representation Learning for Unreliable Partial Label Learning [86.909511808373]
Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth.
This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels.
We propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively.
arXiv Detail & Related papers (2023-08-31T13:37:28Z) - Robust Feature Learning Against Noisy Labels [0.2082426271304908]
Mislabeled samples can significantly degrade the generalization of models.
progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels.
Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels.
arXiv Detail & Related papers (2023-07-10T02:55:35Z) - Adversary-Aware Partial label learning with Label distillation [47.18584755798137]
We present Ad-Aware Partial Label Learning and introduce the $textitrival$, a set of noisy labels, to the collection of candidate labels for each instance.
Our method achieves promising results on the CIFAR10, CIFAR100 and CUB200 datasets.
arXiv Detail & Related papers (2023-04-02T10:18:30Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Prototype-Anchored Learning for Learning with Imperfect Annotations [83.7763875464011]
It is challenging to learn unbiased classification models from imperfectly annotated datasets.
We propose a prototype-anchored learning (PAL) method, which can be easily incorporated into various learning-based classification schemes.
We verify the effectiveness of PAL on class-imbalanced learning and noise-tolerant learning by extensive experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-06-23T10:25:37Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Learning Not to Learn in the Presence of Noisy Labels [104.7655376309784]
We show that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption.
We show that training with this loss function encourages the model to "abstain" from learning on the data points with noisy labels.
arXiv Detail & Related papers (2020-02-16T09:12:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.