CrossSplit: Mitigating Label Noise Memorization through Data Splitting
- URL: http://arxiv.org/abs/2212.01674v2
- Date: Wed, 26 Apr 2023 15:33:27 GMT
- Title: CrossSplit: Mitigating Label Noise Memorization through Data Splitting
- Authors: Jihye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien
- Abstract summary: We propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit.
Experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.
- Score: 25.344386272010397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We approach the problem of improving robustness of deep learning algorithms
in the presence of label noise. Building upon existing label correction and
co-teaching methods, we propose a novel training procedure to mitigate the
memorization of noisy labels, called CrossSplit, which uses a pair of neural
networks trained on two disjoint parts of the labelled dataset. CrossSplit
combines two main ingredients: (i) Cross-split label correction. The idea is
that, since the model trained on one part of the data cannot memorize
example-label pairs from the other part, the training labels presented to each
network can be smoothly adjusted by using the predictions of its peer network;
(ii) Cross-split semi-supervised training. A network trained on one part of the
data also uses the unlabeled inputs of the other part. Extensive experiments on
CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that
our method can outperform the current state-of-the-art in a wide range of noise
ratios.
Related papers
- JointMatch: A Unified Approach for Diverse and Collaborative
Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data.
Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z) - ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical
Consistency for Efficient Semi-supervised Learning [60.57998388590556]
ProtoCon is a novel method for confidence-based pseudo-labeling.
Online nature of ProtoCon allows it to utilise the label history of the entire dataset in one training cycle.
It delivers significant gains and faster convergence over state-of-the-art datasets.
arXiv Detail & Related papers (2023-03-22T23:51:54Z) - Learning from Data with Noisy Labels Using Temporal Self-Ensemble [11.245833546360386]
Deep neural networks (DNNs) have an enormous capacity to memorize noisy labels.
Current state-of-the-art methods present a co-training scheme that trains dual networks using samples associated with small losses.
We propose a simple yet effective robust training scheme that operates by training only a single network.
arXiv Detail & Related papers (2022-07-21T08:16:31Z) - Synergistic Network Learning and Label Correction for Noise-robust Image
Classification [28.27739181560233]
Deep Neural Networks (DNNs) tend to overfit training label noise, resulting in poorer model performance in practice.
We propose a robust label correction framework combining the ideas of small loss selection and noise correction.
We demonstrate our method on both synthetic and real-world datasets with different noise types and rates.
arXiv Detail & Related papers (2022-02-27T23:06:31Z) - GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled
Images as Reference [90.5402652758316]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net.
It uses labeled information to guide the learning of unlabeled instances.
It achieves competitive segmentation accuracy and significantly improves the mIoU by +7$%$ compared to previous approaches.
arXiv Detail & Related papers (2021-12-28T06:48:03Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - Co-Seg: An Image Segmentation Framework Against Label Corruption [8.219887855003648]
Supervised deep learning performance is heavily tied to the availability of high-quality labels for training.
We propose a novel framework, namely Co-Seg, to collaboratively train segmentation networks on datasets which include low-quality noisy labels.
Our framework can be easily implemented in any segmentation algorithm to increase its robustness to noisy labels.
arXiv Detail & Related papers (2021-01-31T20:01:40Z) - Combating noisy labels by agreement: A joint training method with
co-regularization [27.578738673827658]
We propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training.
We show that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels.
arXiv Detail & Related papers (2020-03-05T16:42:41Z) - DivideMix: Learning with Noisy Labels as Semi-supervised Learning [111.03364864022261]
We propose DivideMix, a framework for learning with noisy labels.
Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
arXiv Detail & Related papers (2020-02-18T06:20:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.