Related papers: ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

URL: http://arxiv.org/abs/2103.11395v1
Date: Sun, 21 Mar 2021 13:43:09 GMT
Title: ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning
Authors: Ragav Sachdeva, Filipe R Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro
Abstract summary: proposed training algorithm ScanMix, combines semantic clustering with semi-supervised learning (SSL) to improve the feature representations. ScanMix is designed based on the expectation maximisation (EM) framework, where the E-step estimates the value of a latent variable to cluster the training images. We show state-of-the-art results on standard benchmarks for symmetric, asymmetric and semantic label noise on CIFAR-10 and CIFAR-100, as well as large scale real label noise on WebVision.
Score: 33.376639002442914
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we address the problem of training deep neural networks in the presence of severe label noise. Our proposed training algorithm ScanMix, combines semantic clustering with semi-supervised learning (SSL) to improve the feature representations and enable an accurate identification of noisy samples, even in severe label noise scenarios. To be specific, ScanMix is designed based on the expectation maximisation (EM) framework, where the E-step estimates the value of a latent variable to cluster the training images based on their appearance representations and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. In our evaluations, we show state-of-the-art results on standard benchmarks for symmetric, asymmetric and semantic label noise on CIFAR-10 and CIFAR-100, as well as large scale real label noise on WebVision. Most notably, for the benchmarks contaminated with large noise rates (80% and above), our results are up to 27% better than the related work. The code is available at https://github.com/ragavsachdeva/ScanMix.

Related papers

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels [13.314778587751588]
Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training. We propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels.
arXiv Detail & Related papers (2024-06-22T04:49:39Z)
Combating Label Noise With A General Surrogate Model For Sample Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically. We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z)
Class Prototype-based Cleaner for Label Noise Learning [73.007001454085]
Semi-supervised learning methods are current SOTA solutions to the noisy-label learning problem. We propose a simple yet effective solution, named textbfClass textbfPrototype-based label noise textbfCleaner.
arXiv Detail & Related papers (2022-12-21T04:56:41Z)
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset. Their main drawback remains the proportion of incorrect (noisy) samples retrieved. We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z)
Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space. We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z)
S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise. In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space. Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z)
Training Classifiers that are Universally Robust to All Label Noise Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise. We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning. Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z)
LongReMix: Robust Learning with High Confidence Samples in a Noisy Label Environment [33.376639002442914]
We propose the new 2-stage noisy-label training algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N. Our approach achieves state-of-the-art performance in most datasets.
arXiv Detail & Related papers (2021-03-06T18:48:40Z)
Multi-Objective Interpolation Training for Robustness to Label Noise [17.264550056296915]
We show that standard supervised contrastive learning degrades in the presence of label noise. We propose a novel label noise detection method that exploits the robust feature representations learned via contrastive learning. Experiments on synthetic and real-world noise benchmarks demonstrate that MOIT/MOIT+ achieves state-of-the-art results.
arXiv Detail & Related papers (2020-12-08T15:01:54Z)
Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision. The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr. This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.