Regroup Median Loss for Combating Label Noise
- URL: http://arxiv.org/abs/2312.06273v1
- Date: Mon, 11 Dec 2023 10:19:55 GMT
- Title: Regroup Median Loss for Combating Label Noise
- Authors: Fengpeng Li, Kemou Li, Jinyu Tian and Jiantao Zhou
- Abstract summary: Deep model training requires large-scale datasets of annotated data.
Due to the difficulty of annotating a large number of samples, label noise caused by incorrect annotations is inevitable.
We propose Regroup Median Loss (RML) to reduce the probability of selecting noisy samples and correct losses of noisy samples.
- Score: 19.51996047333779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deep model training procedure requires large-scale datasets of annotated
data. Due to the difficulty of annotating a large number of samples, label
noise caused by incorrect annotations is inevitable, resulting in low model
performance and poor model generalization. To combat label noise, current
methods usually select clean samples based on the small-loss criterion and use
these samples for training. Due to some noisy samples similar to clean ones,
these small-loss criterion-based methods are still affected by label noise. To
address this issue, in this work, we propose Regroup Median Loss (RML) to
reduce the probability of selecting noisy samples and correct losses of noisy
samples. RML randomly selects samples with the same label as the training
samples based on a new loss processing method. Then, we combine the stable mean
loss and the robust median loss through a proposed regrouping strategy to
obtain robust loss estimation for noisy samples. To further improve the model
performance against label noise, we propose a new sample selection strategy and
build a semi-supervised method based on RML. Compared to state-of-the-art
methods, for both the traditionally trained and semi-supervised models, RML
achieves a significant improvement on synthetic and complex real-world
datasets. The source code of the paper has been released.
Related papers
- Foster Adaptivity and Balance in Learning with Noisy Labels [26.309508654960354]
We propose a novel approach named textbfSED to deal with label noise in a textbfSelf-adaptivtextbfE and class-balancetextbfD manner.
A mean-teacher model is then employed to correct labels of noisy samples.
We additionally propose a self-adaptive and class-balanced sample re-weighting mechanism to assign different weights to detected noisy samples.
arXiv Detail & Related papers (2024-07-03T03:10:24Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Learning with Noisy Labels over Imbalanced Subpopulations [13.477553187049462]
Learning with noisy labels (LNL) has attracted significant attention from the research community.
We propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations.
We introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities.
arXiv Detail & Related papers (2022-11-16T07:25:24Z) - Self-Filtering: A Noise-Aware Sample Selection for Label Noise with
Confidence Penalization [39.90342091782778]
We propose a novel selection strategy, textbfSelf-textbfFiltextbftering (SFT), that utilizes the fluctuation of noisy examples in historical predictions to filter them.
Specifically, we introduce a memory bank module that stores the historical predictions of each example and dynamically updates to support the selection for the subsequent learning iteration.
By increasing the weight of the misclassified categories with this term, the loss function is robust to label noise in mild conditions.
arXiv Detail & Related papers (2022-08-24T08:02:36Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Salvage Reusable Samples from Noisy Data for Robust Learning [70.48919625304]
We propose a reusable sample selection and correction approach, termed as CRSSC, for coping with label noise in training deep FG models with web images.
Our key idea is to additionally identify and correct reusable samples, and then leverage them together with clean examples to update the networks.
arXiv Detail & Related papers (2020-08-06T02:07:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.