Related papers: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

URL: http://arxiv.org/abs/2002.07394v1
Date: Tue, 18 Feb 2020 06:20:06 GMT
Title: DivideMix: Learning with Noisy Labels as Semi-supervised Learning
Authors: Junnan Li, Richard Socher, Steven C.H. Hoi
Abstract summary: We propose DivideMix, a framework for learning with noisy labels. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods.
Score: 111.03364864022261
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix .

Related papers

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training. We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z)
JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification [65.268245109828]
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. Existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. We propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning.
arXiv Detail & Related papers (2023-10-23T05:43:35Z)
Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise [4.90148689564172]
Real-world datasets contain noisy label samples that have no semantic relevance to any class in the dataset. Most state-of-the-art methods leverage ID labeled noisy samples as unlabeled data for semi-supervised learning. We propose incorporating the information from all the training data by leveraging the benefits of self-supervised training.
arXiv Detail & Related papers (2023-08-13T23:33:33Z)
CrossSplit: Mitigating Label Noise Memorization through Data Splitting [25.344386272010397]
We propose a novel training procedure to mitigate the memorization of noisy labels, called CrossSplit. Experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and mini-WebVision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios.
arXiv Detail & Related papers (2022-12-03T19:09:56Z)
Label-Noise Learning with Intrinsically Long-Tailed Data [65.41318436799993]
We propose a learning framework for label-noise learning with intrinsically long-tailed data. Specifically, we propose two-stage bi-dimensional sample selection (TABASCO) to better separate clean samples from noisy samples.
arXiv Detail & Related papers (2022-08-21T07:47:05Z)
GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled Images as Reference [90.5402652758316]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net. It uses labeled information to guide the learning of unlabeled instances. It achieves competitive segmentation accuracy and significantly improves the mIoU by +7$%$ compared to previous approaches.
arXiv Detail & Related papers (2021-12-28T06:48:03Z)
OpenCoS: Contrastive Semi-supervised Learning for Handling Open-set Unlabeled Data [65.19205979542305]
Unlabeled data may include out-of-class samples in practice. OpenCoS is a method for handling this realistic semi-supervised learning scenario.
arXiv Detail & Related papers (2021-06-29T06:10:05Z)
GuidedMix-Net: Learning to Improve Pseudo Masks Using Labeled Images as Reference [153.354332374204]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net. We first introduce a feature alignment objective between labeled and unlabeled data to capture potentially similar image pairs. MITrans is shown to be a powerful knowledge module for further progressive refining features of unlabeled data. Along with supervised learning for labeled data, the prediction of unlabeled data is jointly learned with the generated pseudo masks.
arXiv Detail & Related papers (2021-06-29T02:48:45Z)
Co-Seg: An Image Segmentation Framework Against Label Corruption [8.219887855003648]
Supervised deep learning performance is heavily tied to the availability of high-quality labels for training. We propose a novel framework, namely Co-Seg, to collaboratively train segmentation networks on datasets which include low-quality noisy labels. Our framework can be easily implemented in any segmentation algorithm to increase its robustness to noisy labels.
arXiv Detail & Related papers (2021-01-31T20:01:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.