Denoising after Entropy-based Debiasing A Robust Training Method for
Dataset Bias with Noisy Labels
- URL: http://arxiv.org/abs/2212.01189v1
- Date: Thu, 1 Dec 2022 04:34:59 GMT
- Title: Denoising after Entropy-based Debiasing A Robust Training Method for
Dataset Bias with Noisy Labels
- Authors: Sumyeong Ahn and Se-Young Yun
- Abstract summary: We propose an approach called denoising after entropy-based debiasing, i.e., DENEB, which has three main stages.
We find that running denoising algorithms before debiasing is ineffective because denoising algorithms reduce the impact of difficult-to-learn samples.
- Score: 12.335698325757491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improperly constructed datasets can result in inaccurate inferences. For
instance, models trained on biased datasets perform poorly in terms of
generalization (i.e., dataset bias). Recent debiasing techniques have
successfully achieved generalization performance by underestimating
easy-to-learn samples (i.e., bias-aligned samples) and highlighting
difficult-to-learn samples (i.e., bias-conflicting samples). However, these
techniques may fail owing to noisy labels, because the trained model recognizes
noisy labels as difficult-to-learn and thus highlights them. In this study, we
find that earlier approaches that used the provided labels to quantify
difficulty could be affected by the small proportion of noisy labels.
Furthermore, we find that running denoising algorithms before debiasing is
ineffective because denoising algorithms reduce the impact of
difficult-to-learn samples, including valuable bias-conflicting samples.
Therefore, we propose an approach called denoising after entropy-based
debiasing, i.e., DENEB, which has three main stages. (1) The prejudice model is
trained by emphasizing (bias-aligned, clean) samples, which are selected using
a Gaussian Mixture Model. (2) Using the per-sample entropy from the output of
the prejudice model, the sampling probability of each sample that is
proportional to the entropy is computed. (3) The final model is trained using
existing denoising algorithms with the mini-batches constructed by following
the computed sampling probability. Compared to existing debiasing and denoising
algorithms, our method achieves better debiasing performance on multiple
benchmarks.
Related papers
- A Bilevel Optimization Framework for Imbalanced Data Classification [1.6385815610837167]
We propose a new undersampling approach that avoids the pitfalls of noise and overlap caused by synthetic data.
Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss.
Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it.
arXiv Detail & Related papers (2024-10-15T01:17:23Z) - Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Double Correction Framework for Denoising Recommendation [45.98207284259792]
In implicit feedback, noisy samples can affect precise user preference learning.
A popular solution is based on dropping noisy samples in the model training phase.
We propose a Double Correction Framework for Denoising Recommendation.
arXiv Detail & Related papers (2024-05-18T12:15:10Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Late Stopping: Avoiding Confidently Learning from Mislabeled Examples [61.00103151680946]
We propose a new framework, Late Stopping, which leverages the intrinsic robust learning ability of DNNs through a prolonged training process.
We empirically observe that mislabeled and clean examples exhibit differences in the number of epochs required for them to be consistently and correctly classified.
Experimental results on benchmark-simulated and real-world noisy datasets demonstrate that the proposed method outperforms state-of-the-art counterparts.
arXiv Detail & Related papers (2023-08-26T12:43:25Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets [23.4536532321199]
We propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
Inspired by our observations, we propose an Uncertainty-aware Label Correction framework to handle label noise on imbalanced datasets.
arXiv Detail & Related papers (2022-07-12T11:35:55Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Denoising Distantly Supervised Named Entity Recognition via a
Hypergeometric Probabilistic Model [26.76830553508229]
Hypergeometric Learning (HGL) is a denoising algorithm for distantly supervised named entity recognition.
HGL takes both noise distribution and instance-level confidence into consideration.
Experiments show that HGL can effectively denoise the weakly-labeled data retrieved from distant supervision.
arXiv Detail & Related papers (2021-06-17T04:01:25Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.