Semi-Supervised Cascaded Clustering for Classification of Noisy Label
Data
- URL: http://arxiv.org/abs/2205.02209v1
- Date: Wed, 4 May 2022 17:42:22 GMT
- Title: Semi-Supervised Cascaded Clustering for Classification of Noisy Label
Data
- Authors: Ashit Gupta, Anirudh Deodhar, Tathagata Mukherjee and Venkataramana
Runkana
- Abstract summary: The performance of supervised classification techniques often deteriorates when the data has noisy labels.
Most of the approaches addressing the noisy label data rely on deep neural networks (DNN) that require huge datasets for classification tasks.
We propose a semi-supervised cascaded clustering algorithm to extract patterns and generate a cascaded tree of classes in such datasets.
- Score: 0.3441021278275805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performance of supervised classification techniques often deteriorates
when the data has noisy labels. Even the semi-supervised classification
approaches have largely focused only on the problem of handling missing labels.
Most of the approaches addressing the noisy label data rely on deep neural
networks (DNN) that require huge datasets for classification tasks. This poses
a serious challenge especially in process and manufacturing industries, where
the data is limited and labels are noisy. We propose a semi-supervised cascaded
clustering (SSCC) algorithm to extract patterns and generate a cascaded tree of
classes in such datasets. A novel cluster evaluation matrix (CEM) with
configurable hyperparameters is introduced to localize and eliminate the noisy
labels and invoke a pruning criterion on cascaded clustering. The algorithm
reduces the dependency on expensive human expertise for assessing the accuracy
of labels. A classifier generated based on SSCC is found to be accurate and
consistent even when trained on noisy label datasets. It performed better in
comparison with the support vector machines (SVM) when tested on multiple
noisy-label datasets, including an industrial dataset. The proposed approach
can be effectively used for deriving actionable insights in industrial settings
with minimal human expertise.
Related papers
- Semi-Supervised Hierarchical Multi-Label Classifier Based on Local Information [1.6574413179773761]
Semi-supervised hierarchical multi-label classifier based on local information (SSHMC-BLI)
SSHMC-BLI builds pseudo-labels for each unlabeled instance from the paths of labels of its labeled neighbors.
Experiments on 12 challenging datasets from functional genomics show that making use of unlabeled along with labeled data can help to improve the performance of a supervised hierarchical classifier trained only on labeled data.
arXiv Detail & Related papers (2024-04-30T20:16:40Z) - Noisy Label Processing for Classification: A Survey [2.8821062918162146]
In the long, tedious process of data annotation, annotators are prone to make mistakes, resulting in incorrect labels of images.
It is crucial to combat noisy labels for computer vision tasks, especially for classification tasks.
We propose an algorithm to generate a synthetic label noise pattern guided by real-world data.
arXiv Detail & Related papers (2024-04-05T15:11:09Z) - Group Benefits Instances Selection for Data Purification [21.977432359384835]
Existing methods for combating label noise are typically designed and tested on synthetic datasets.
We propose a method named GRIP to alleviate the noisy label problem for both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-23T03:06:19Z) - Transductive CLIP with Class-Conditional Contrastive Learning [68.51078382124331]
We propose Transductive CLIP, a novel framework for learning a classification network with noisy labels from scratch.
A class-conditional contrastive learning mechanism is proposed to mitigate the reliance on pseudo labels.
ensemble labels is adopted as a pseudo label updating strategy to stabilize the training of deep neural networks with noisy labels.
arXiv Detail & Related papers (2022-06-13T14:04:57Z) - S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise.
In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space.
Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z) - Meta Clustering Learning for Large-scale Unsupervised Person
Re-identification [124.54749810371986]
We propose a "small data for big task" paradigm dubbed Meta Clustering Learning (MCL)
MCL only pseudo-labels a subset of the entire unlabeled data via clustering to save computing for the first-phase training.
Our method significantly saves computational cost while achieving a comparable or even better performance compared to prior works.
arXiv Detail & Related papers (2021-11-19T04:10:18Z) - Weakly Supervised Classification Using Group-Level Labels [12.285265254225166]
We propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models.
We model group-level labels as Class Conditional Noisy (CCN) labels for individual instances and use the noisy labels to regularize predictions of the model trained on the strongly-labeled instances.
arXiv Detail & Related papers (2021-08-16T20:01:45Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Improving Face Recognition by Clustering Unlabeled Faces in the Wild [77.48677160252198]
We propose a novel identity separation method based on extreme value theory.
It greatly reduces the problems caused by overlapping-identity label noise.
Experiments on both controlled and real settings demonstrate our method's consistent improvements.
arXiv Detail & Related papers (2020-07-14T12:26:50Z) - Global Multiclass Classification and Dataset Construction via
Heterogeneous Local Experts [37.27708297562079]
We show how to minimize the number of labelers while ensuring the reliability of the resulting dataset.
Experiments with the MNIST and CIFAR-10 datasets demonstrate the favorable accuracy of our aggregation scheme.
arXiv Detail & Related papers (2020-05-21T18:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.