Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise
- URL: http://arxiv.org/abs/2308.06861v1
- Date: Sun, 13 Aug 2023 23:33:33 GMT
- Title: Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
Severe Label Noise
- Authors: Fahimeh Fooladgar, Minh Nguyen Nhat To, Parvin Mousavi, Purang
Abolmaesumi
- Abstract summary: Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset.
Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning.
We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
- Score: 4.90148689564172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have proven to be highly effective when large amounts of
data with clean labels are available. However, their performance degrades when
training data contains noisy labels, leading to poor generalization on the test
set. Real-world datasets contain noisy label samples that either have similar
visual semantics to other classes (in-distribution) or have no semantic
relevance to any class (out-of-distribution) in the dataset. Most
state-of-the-art methods leverage ID labeled noisy samples as unlabeled data
for semi-supervised learning, but OOD labeled noisy samples cannot be used in
this way because they do not belong to any class within the dataset. Hence, in
this paper, we propose incorporating the information from all the training data
by leveraging the benefits of self-supervised training. Our method aims to
extract a meaningful and generalizable embedding space for each sample
regardless of its label. Then, we employ a simple yet effective K-nearest
neighbor method to remove portions of out-of-distribution samples. By
discarding these samples, we propose an iterative "Manifold DivideMix"
algorithm to find clean and noisy samples, and train our model in a
semi-supervised way. In addition, we propose "MixEMatch", a new algorithm for
the semi-supervised step that involves mixup augmentation at the input and
final hidden representations of the model. This will extract better
representations by interpolating both in the input and manifold spaces.
Extensive experiments on multiple synthetic-noise image benchmarks and
real-world web-crawled datasets demonstrate the effectiveness of our proposed
framework. Code is available at https://github.com/Fahim-F/ManifoldDivideMix.
Related papers
- Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching.
It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training.
We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Pairwise Similarity Distribution Clustering for Noisy Label Learning [0.0]
Noisy label learning aims to train deep neural networks using a large amount of samples with noisy labels.
We propose a simple yet effective sample selection algorithm to divide the training samples into one clean set and another noisy set.
Experimental results on various benchmark datasets, such as CIFAR-10, CIFAR-100 and Clothing1M, demonstrate significant improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-04-02T11:30:22Z) - Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data.
Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z) - UNICON: Combating Label Noise Through Uniform Selection and Contrastive
Learning [89.56465237941013]
We propose UNICON, a simple yet effective sample selection method which is robust to high label noise.
We obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate.
arXiv Detail & Related papers (2022-03-28T07:36:36Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Sample Prior Guided Robust Model Learning to Suppress Noisy Labels [8.119439844514973]
We propose PGDF, a novel framework to learn a deep model to suppress noise by generating the samples' prior knowledge.
Our framework can save more informative hard clean samples into the cleanly labeled set.
We evaluate our method using synthetic datasets based on CIFAR-10 and CIFAR-100, as well as on the real-world datasets WebVision and Clothing1M.
arXiv Detail & Related papers (2021-12-02T13:09:12Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.