An Embedding is Worth a Thousand Noisy Labels
- URL: http://arxiv.org/abs/2408.14358v1
- Date: Mon, 26 Aug 2024 15:32:31 GMT
- Title: An Embedding is Worth a Thousand Noisy Labels
- Authors: Francesco Di Salvo, Sebastian Doerrich, Ines Rieger, Christian Ledig,
- Abstract summary: We propose WANN, a weighted Adaptive Nearest Neighbor approach to address label noise.
We show WANN outperforms reference methods on diverse datasets of varying size and under various noise types and severities.
Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training.
- Score: 0.11999555634662634
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of deep neural networks scales with dataset size and label quality, rendering the efficient mitigation of low-quality data annotations crucial for building robust and cost-effective systems. Existing strategies to address label noise exhibit severe limitations due to computational complexity and application dependency. In this work, we propose WANN, a Weighted Adaptive Nearest Neighbor approach that builds on self-supervised feature representations obtained from foundation models. To guide the weighted voting scheme, we introduce a reliability score, which measures the likelihood of a data label being correct. WANN outperforms reference methods, including a linear layer trained with robust loss functions, on diverse datasets of varying size and under various noise types and severities. WANN also exhibits superior generalization on imbalanced data compared to both Adaptive-NNs (ANN) and fixed k-NNs. Furthermore, the proposed weighting scheme enhances supervised dimensionality reduction under noisy labels. This yields a significant boost in classification performance with 10x and 100x smaller image embeddings, minimizing latency and storage requirements. Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training. The code is available at https://github.com/francescodisalvo05/wann-noisy-labels .
Related papers
- ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance.
ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience.
Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z) - All Points Matter: Entropy-Regularized Distribution Alignment for
Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning.
We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z) - Uncertainty-aware Self-training for Low-resource Neural Sequence
Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL)
We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data.
A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - Constrained Instance and Class Reweighting for Robust Learning under
Label Noise [20.30785186456126]
We propose a principled approach for tackling label noise with the aim of assigning importance weights to individual instances and class labels.
Our method works by formulating a class of constrained optimization problems that yield simple closed form updates for these importance weights.
We evaluate our method on several benchmark datasets and observe considerable performance gains in the presence of label noise.
arXiv Detail & Related papers (2021-11-09T21:37:53Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Delving Deep into Label Smoothing [112.24527926373084]
Label smoothing is an effective regularization tool for deep neural networks (DNNs)
We present an Online Label Smoothing (OLS) strategy, which generates soft labels based on the statistics of the model prediction for the target category.
arXiv Detail & Related papers (2020-11-25T08:03:11Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Rectified Meta-Learning from Noisy Labels for Robust Image-based Plant
Disease Diagnosis [64.82680813427054]
Plant diseases serve as one of main threats to food security and crop production.
One popular approach is to transform this problem as a leaf image classification task, which can be addressed by the powerful convolutional neural networks (CNNs)
We propose a novel framework that incorporates rectified meta-learning module into common CNN paradigm to train a noise-robust deep network without using extra supervision information.
arXiv Detail & Related papers (2020-03-17T09:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.