Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels
- URL: http://arxiv.org/abs/2310.06600v2
- Date: Tue, 28 May 2024 13:15:02 GMT
- Title: Pi-DUAL: Using Privileged Information to Distinguish Clean from Noisy Labels
- Authors: Ke Wang, Guillermo Ortiz-Jimenez, Rodolphe Jenatton, Mark Collier, Efi Kokiopoulou, Pascal Frossard,
- Abstract summary: We introduce Pi-DUAL, an architecture designed to harness privileged information (PI) to distinguish clean from wrong labels.
Pi-DUAL achieves significant performance improvements on key PI benchmarks, establishing a new state-of-the-art test set accuracy.
Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.
- Score: 47.85182773875054
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Label noise is a pervasive problem in deep learning that often compromises the generalization performance of trained models. Recently, leveraging privileged information (PI) -- information available only during training but not at test time -- has emerged as an effective approach to mitigate this issue. Yet, existing PI-based methods have failed to consistently outperform their no-PI counterparts in terms of preventing overfitting to label noise. To address this deficiency, we introduce Pi-DUAL, an architecture designed to harness PI to distinguish clean from wrong labels. Pi-DUAL decomposes the output logits into a prediction term, based on conventional input features, and a noise-fitting term influenced solely by PI. A gating mechanism steered by PI adaptively shifts focus between these terms, allowing the model to implicitly separate the learning paths of clean and wrong labels. Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on ImageNet-PI), establishing a new state-of-the-art test set accuracy. Additionally, Pi-DUAL is a potent method for identifying noisy samples post-training, outperforming other strong methods at this task. Overall, Pi-DUAL is a simple, scalable and practical approach for mitigating the effects of label noise in a variety of real-world scenarios with PI.
Related papers
- Bayesian Prediction-Powered Inference [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a framework for PPI based on Bayesian inference that allows researchers to develop new task-appropriate PPI methods easily.
arXiv Detail & Related papers (2024-05-09T18:08:58Z) - ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - Towards Effective Visual Representations for Partial-Label Learning [49.91355691337053]
Under partial-label learning (PLL), for each training instance, only a set of ambiguous labels containing the unknown true label is accessible.
Without access to true labels, positive points are predicted using pseudo-labels that are inherently noisy, and negative points often require large batches or momentum encoders.
In this paper, we rethink a state-of-the-artive contrastive method PiCO[PiPi24], which demonstrates significant scope for improvement in representation learning.
arXiv Detail & Related papers (2023-05-10T12:01:11Z) - When does Privileged Information Explain Away Label Noise? [66.9725683097357]
We investigate the role played by different properties of the PI in explaining away label noise.
We find that PI is most helpful when it allows networks to easily distinguish clean from noisy data.
We propose several enhancements to the state-of-the-art PI methods and demonstrate the potential of PI as a means of tackling label noise.
arXiv Detail & Related papers (2023-03-03T09:25:39Z) - Reliable Prediction Intervals with Directly Optimized Inductive
Conformal Regression for Deep Learning [3.42658286826597]
Predictions intervals (PIs) are used to quantify the uncertainty of each prediction in deep learning regression.
Many approaches to improve the quality of PIs can effectively reduce the width of PIs, but they do not ensure that enough real labels are captured.
In this study, we use Directly Optimized Inductive Conformal Regression (DOICR) that takes only the average width of PIs as the loss function.
Benchmark experiments show that DOICR outperforms current state-of-the-art algorithms for regression problems.
arXiv Detail & Related papers (2023-02-02T04:46:14Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.