PropMix: Hard Sample Filtering and Proportional MixUp for Learning with
Noisy Labels
- URL: http://arxiv.org/abs/2110.11809v1
- Date: Fri, 22 Oct 2021 14:27:37 GMT
- Title: PropMix: Hard Sample Filtering and Proportional MixUp for Learning with
Noisy Labels
- Authors: Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
- Abstract summary: The most competitive noisy label learning methods rely on an unsupervised classification of clean and noisy samples.
PropMix filters out hard noisy samples, with the goal of increasing the likelihood of correctly re-labelling the easy noisy samples.
PropMix has state-of-the-art (SOTA) results on CIFAR-10/-100(with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision.
- Score: 36.461580348771435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The most competitive noisy label learning methods rely on an unsupervised
classification of clean and noisy samples, where samples classified as noisy
are re-labelled and "MixMatched" with the clean samples. These methods have two
issues in large noise rate problems: 1) the noisy set is more likely to contain
hard samples that are in-correctly re-labelled, and 2) the number of samples
produced by MixMatch tends to be reduced because it is constrained by the small
clean set size. In this paper, we introduce the learning algorithm PropMix to
handle the issues above. PropMix filters out hard noisy samples, with the goal
of increasing the likelihood of correctly re-labelling the easy noisy samples.
Also, PropMix places clean and re-labelled easy noisy samples in a training set
that is augmented with MixUp, removing the clean set size constraint and
including a large proportion of correctly re-labelled easy noisy samples. We
also include self-supervised pre-training to improve robustness to high noisy
label scenarios. Our experiments show that PropMix has state-of-the-art (SOTA)
results on CIFAR-10/-100(with symmetric, asymmetric and semantic label noise),
Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and
WebVision. In severe label noise bench-marks, our results are substantially
better than other methods. The code is available
athttps://github.com/filipe-research/PropMix.
Related papers
- Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem.
We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - ProMix: Combating Label Noise via Maximizing Clean Sample Utility [18.305972075220765]
ProMix is a framework to maximize the utility of clean samples for boosted performance.
It achieves an average improvement of 2.48% on the CIFAR-N dataset.
arXiv Detail & Related papers (2022-07-21T03:01:04Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - An Ensemble Noise-Robust K-fold Cross-Validation Selection Method for
Noisy Labels [0.9699640804685629]
Large-scale datasets tend to contain mislabeled samples that can be memorized by deep neural networks (DNNs)
We present Ensemble Noise-robust K-fold Cross-Validation Selection (E-NKCVS) to effectively select clean samples from noisy data.
We evaluate our approach on various image and text classification tasks where the labels have been manually corrupted with different noise ratios.
arXiv Detail & Related papers (2021-07-06T02:14:52Z) - LongReMix: Robust Learning with High Confidence Samples in a Noisy Label
Environment [33.376639002442914]
We propose the new 2-stage noisy-label training algorithm LongReMix.
We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N.
Our approach achieves state-of-the-art performance in most datasets.
arXiv Detail & Related papers (2021-03-06T18:48:40Z) - EvidentialMix: Learning with Combined Open-set and Closed-set Noisy
Labels [30.268962418683955]
We study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels.
Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:15:32Z) - Suppressing Mislabeled Data via Grouping and Self-Attention [60.14212694011875]
Deep networks achieve excellent results on large-scale clean data but degrade significantly when learning from noisy labels.
This paper proposes a conceptually simple yet efficient training block, termed as Attentive Feature Mixup (AFM)
It allows paying more attention to clean samples and less to mislabeled ones via sample interactions in small groups.
arXiv Detail & Related papers (2020-10-29T13:54:16Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.