Improving Semi-supervised Deep Learning by using Automatic Thresholding
to Deal with Out of Distribution Data for COVID-19 Detection using Chest
X-ray Images
- URL: http://arxiv.org/abs/2211.02142v1
- Date: Thu, 3 Nov 2022 20:56:45 GMT
- Title: Improving Semi-supervised Deep Learning by using Automatic Thresholding
to Deal with Out of Distribution Data for COVID-19 Detection using Chest
X-ray Images
- Authors: Isaac Benavides-Mata, Saul Calderon-Ramirez
- Abstract summary: We propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset.
We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semi-supervised learning (SSL) leverages both labeled and unlabeled data for
training models when the labeled data is limited and the unlabeled data is
vast. Frequently, the unlabeled data is more widely available than the labeled
data, hence this data is used to improve the level of generalization of a model
when the labeled data is scarce. However, in real-world settings unlabeled data
might depict a different distribution than the labeled dataset distribution.
This is known as distribution mismatch. Such problem generally occurs when the
source of unlabeled data is different from the labeled data. For instance, in
the medical imaging domain, when training a COVID-19 detector using chest X-ray
images, different unlabeled datasets sampled from different hospitals might be
used. In this work, we propose an automatic thresholding method to filter
out-of-distribution data in the unlabeled dataset. We use the Mahalanobis
distance between the labeled and unlabeled datasets using the feature space
built by a pre-trained Image-net Feature Extractor (FE) to score each unlabeled
observation. We test two simple automatic thresholding methods in the context
of training a COVID-19 detector using chest X-ray images. The tested methods
provide an automatic manner to define what unlabeled data to preserve when
training a semi-supervised deep learning architecture.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation [15.815414883505722]
In semi-supervised medical image segmentation, there exist empirical mismatch problems between labeled and unlabeled data distribution.
We propose a straightforward method for alleviating the problem - copy-pasting labeled and unlabeled data bidirectionally.
We show that the experiments show solid gains (e.g., over 21% Dice improvement on ACDC dataset with 5% labeled data) compared with other state-of-the-arts on various semi-supervised medical image segmentation datasets.
arXiv Detail & Related papers (2023-05-01T06:06:51Z) - CTRL: Clustering Training Losses for Label Error Detection [4.49681473359251]
In supervised machine learning, use of correct labels is extremely important to ensure high accuracy.
We propose a novel framework, calledClustering TRaining Losses for label error detection.
It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways.
arXiv Detail & Related papers (2022-08-17T18:09:19Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - OSSGAN: Open-Set Semi-Supervised Image Generation [26.67298827670573]
We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation.
OSSGAN provides decision clues to the discriminator on the basis of whether an unlabeled image belongs to one or none of the classes of interest.
The results of experiments on Tiny ImageNet and ImageNet show notable improvements over supervised BigGAN and semi-supervised methods.
arXiv Detail & Related papers (2022-04-29T17:26:09Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled
Images as Reference [90.5402652758316]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net.
It uses labeled information to guide the learning of unlabeled instances.
It achieves competitive segmentation accuracy and significantly improves the mIoU by +7$%$ compared to previous approaches.
arXiv Detail & Related papers (2021-12-28T06:48:03Z) - Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets.
In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset.
This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z) - Identifying Mislabeled Images in Supervised Learning Utilizing
Autoencoder [0.0]
In image classification, incorrect labels may cause the classification model to be inaccurate as well.
In this paper, I am going to apply unsupervised techniques to the training data before training the classification network.
The algorithm can detect and remove above 67% of mislabeled data in the experimental dataset.
arXiv Detail & Related papers (2020-11-07T03:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.