Identifying Mislabeled Images in Supervised Learning Utilizing
Autoencoder
- URL: http://arxiv.org/abs/2011.03667v2
- Date: Mon, 18 Jan 2021 22:59:44 GMT
- Title: Identifying Mislabeled Images in Supervised Learning Utilizing
Autoencoder
- Authors: Yunhao Yang, Andrew Whinston
- Abstract summary: In image classification, incorrect labels may cause the classification model to be inaccurate as well.
In this paper, I am going to apply unsupervised techniques to the training data before training the classification network.
The algorithm can detect and remove above 67% of mislabeled data in the experimental dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised learning is based on the assumption that the ground truth in the
training data is accurate. However, this may not be guaranteed in real-world
settings. Inaccurate training data will result in some unexpected predictions.
In image classification, incorrect labels may cause the classification model to
be inaccurate as well. In this paper, I am going to apply unsupervised
techniques to the training data before training the classification network. A
convolutional autoencoder is applied to encode and reconstruct images. The
encoder will project the image data on to latent space. In the latent space,
image features are preserved in a lower dimension. The assumption is that data
samples with similar features are likely to have the same label. Noised samples
can be classified in the latent space by the Density-Base Scan (DBSCAN)
clustering algorithm. These incorrectly labeled data are visualized as outliers
in the latent space. Therefore, the outliers identified by the DBSCAN algorithm
can be classified as incorrectly labeled samples. After the outliers are
detected, all the outliers are treated as mislabeled data samples and removed
from the dataset. Thus the training data can be directly used in training the
supervised learning network. The algorithm can detect and remove above 67\% of
mislabeled data in the experimental dataset.
Related papers
- Are labels informative in semi-supervised learning? -- Estimating and
leveraging the missing-data mechanism [4.675583319625962]
Semi-supervised learning is a powerful technique for leveraging unlabeled data to improve machine learning models.
It can be affected by the presence of informative'' labels, which occur when some classes are more likely to be labeled than others.
We propose a novel approach to address this issue by estimating the missing-data mechanism and using inverse propensity weighting to debias any SSL algorithm.
arXiv Detail & Related papers (2023-02-15T09:18:46Z) - Improving Semi-supervised Deep Learning by using Automatic Thresholding
to Deal with Out of Distribution Data for COVID-19 Detection using Chest
X-ray Images [0.0]
We propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset.
We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images.
arXiv Detail & Related papers (2022-11-03T20:56:45Z) - CTRL: Clustering Training Losses for Label Error Detection [4.49681473359251]
In supervised machine learning, use of correct labels is extremely important to ensure high accuracy.
We propose a novel framework, calledClustering TRaining Losses for label error detection.
It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways.
arXiv Detail & Related papers (2022-08-17T18:09:19Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Instance Correction for Learning with Open-set Noisy Labels [145.06552420999986]
We use the sample selection approach to handle open-set noisy labels.
The discarded data are seen to be mislabeled and do not participate in training.
We modify the instances of discarded data to make predictions for the discarded data consistent with given labels.
arXiv Detail & Related papers (2021-06-01T13:05:55Z) - Sample Selection with Uncertainty of Losses for Learning with Noisy
Labels [145.06552420999986]
In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled during training.
However, losses are generated on-the-fly based on the model being trained with noisy labels, and thus large-loss data are likely but not certainly to be incorrect.
In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses.
arXiv Detail & Related papers (2021-06-01T12:53:53Z) - Outlier Detection through Null Space Analysis of Neural Networks [3.220347094114561]
We use the concept of the null space to integrate an outlier detection method directly into a neural network used for classification.
Our method, called Null Space Analysis (NuSA) of neural networks, works by computing and controlling the magnitude of the null space projection as data is passed through a network.
Results are shown that indicate networks trained with NuSA retain their classification performance while also being able to detect outliers at rates similar to commonly used outlier detection algorithms.
arXiv Detail & Related papers (2020-07-02T17:17:21Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.