Learning from Training Dynamics: Identifying Mislabeled Data Beyond
Manually Designed Features
- URL: http://arxiv.org/abs/2212.09321v2
- Date: Tue, 20 Dec 2022 06:37:00 GMT
- Title: Learning from Training Dynamics: Identifying Mislabeled Data Beyond
Manually Designed Features
- Authors: Qingrui Jia, Xuhong Li, Lei Yu, Jiang Bian, Penghao Zhao, Shupeng Li,
Haoyi Xiong, Dejing Dou
- Abstract summary: We introduce a novel learning-based solution, leveraging a noise detector, instanced by an LSTM network.
The proposed method trains the noise detector in a supervised manner using the dataset with synthesized label noises.
Results show that the proposed method precisely detects mislabeled samples on various datasets without further adaptation.
- Score: 43.41573458276422
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While mislabeled or ambiguously-labeled samples in the training set could
negatively affect the performance of deep models, diagnosing the dataset and
identifying mislabeled samples helps to improve the generalization power.
Training dynamics, i.e., the traces left by iterations of optimization
algorithms, have recently been proved to be effective to localize mislabeled
samples with hand-crafted features. In this paper, beyond manually designed
features, we introduce a novel learning-based solution, leveraging a noise
detector, instanced by an LSTM network, which learns to predict whether a
sample was mislabeled using the raw training dynamics as input. Specifically,
the proposed method trains the noise detector in a supervised manner using the
dataset with synthesized label noises and can adapt to various datasets (either
naturally or synthesized label-noised) without retraining. We conduct extensive
experiments to evaluate the proposed method. We train the noise detector based
on the synthesized label-noised CIFAR dataset and test such noise detector on
Tiny ImageNet, CUB-200, Caltech-256, WebVision and Clothing1M. Results show
that the proposed method precisely detects mislabeled samples on various
datasets without further adaptation, and outperforms state-of-the-art methods.
Besides, more experiments demonstrate that the mislabel identification can
guide a label correction, namely data debugging, providing orthogonal
improvements of algorithm-centric state-of-the-art techniques from the data
aspect.
Related papers
- Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection [84.78475642696137]
The existence of noisy labels in real-world data negatively impacts the performance of deep learning models.
We propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS)
SGPS constructs reliable positive pairs for noisy samples to enhance the sample utilization.
arXiv Detail & Related papers (2025-01-19T14:41:55Z) - Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement [3.272177633069322]
Real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process.
We propose a novel framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement.
Our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions.
arXiv Detail & Related papers (2024-12-06T09:56:49Z) - Combating Label Noise With A General Surrogate Model For Sample Selection [77.45468386115306]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Learning with Noisy labels via Self-supervised Adversarial Noisy Masking [33.87292143223425]
We propose a novel training approach termed adversarial noisy masking.
It adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples.
It is tested on both synthetic and real-world noisy datasets.
arXiv Detail & Related papers (2023-02-14T03:13:26Z) - Learning from Noisy Labels with Coarse-to-Fine Sample Credibility
Modeling [22.62790706276081]
Training deep neural network (DNN) with noisy labels is practically challenging.
Previous efforts tend to handle part or full data in a unified denoising flow.
We propose a coarse-to-fine robust learning method called CREMA to handle noisy data in a divide-and-conquer manner.
arXiv Detail & Related papers (2022-08-23T02:06:38Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.