Related papers: Improving Semi-supervised Deep Learning by using Automatic Thresholding to Deal with Out of Distribution Data for COVID-19 Detection using Chest X-ray Images

Improving Semi-supervised Deep Learning by using Automatic Thresholding to Deal with Out of Distribution Data for COVID-19 Detection using Chest X-ray Images

URL: http://arxiv.org/abs/2211.02142v1
Date: Thu, 3 Nov 2022 20:56:45 GMT
Title: Improving Semi-supervised Deep Learning by using Automatic Thresholding to Deal with Out of Distribution Data for COVID-19 Detection using Chest X-ray Images
Authors: Isaac Benavides-Mata, Saul Calderon-Ramirez
Abstract summary: We propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset. We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Semi-supervised learning (SSL) leverages both labeled and unlabeled data for training models when the labeled data is limited and the unlabeled data is vast. Frequently, the unlabeled data is more widely available than the labeled data, hence this data is used to improve the level of generalization of a model when the labeled data is scarce. However, in real-world settings unlabeled data might depict a different distribution than the labeled dataset distribution. This is known as distribution mismatch. Such problem generally occurs when the source of unlabeled data is different from the labeled data. For instance, in the medical imaging domain, when training a COVID-19 detector using chest X-ray images, different unlabeled datasets sampled from different hospitals might be used. In this work, we propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset. We use the Mahalanobis distance between the labeled and unlabeled datasets using the feature space built by a pre-trained Image-net Feature Extractor (FE) to score each unlabeled observation. We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images. The tested methods provide an automatic manner to define what unlabeled data to preserve when training a semi-supervised deep learning architecture.

Related papers

Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data. This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning. We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z)
You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method. Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling. We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z)
FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data. Most SSL methods are commonly based on instance-wise consistency between different data transformations. We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z)
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation [15.815414883505722]
In semi-supervised medical image segmentation, there exist empirical mismatch problems between labeled and unlabeled data distribution. We propose a straightforward method for alleviating the problem - copy-pasting labeled and unlabeled data bidirectionally. We show that the experiments show solid gains (e.g., over 21% Dice improvement on ACDC dataset with 5% labeled data) compared with other state-of-the-arts on various semi-supervised medical image segmentation datasets.
arXiv Detail & Related papers (2023-05-01T06:06:51Z)
CTRL: Clustering Training Losses for Label Error Detection [4.49681473359251]
In supervised machine learning, use of correct labels is extremely important to ensure high accuracy. We propose a novel framework, calledClustering TRaining Losses for label error detection. It detects label errors in two steps based on the observation that models learn clean and noisy labels in different ways.
arXiv Detail & Related papers (2022-08-17T18:09:19Z)
Self-Supervised Learning as a Means To Reduce the Need for Labeled Data in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies. We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z)
OSSGAN: Open-Set Semi-Supervised Image Generation [26.67298827670573]
We introduce a challenging training scheme of conditional GANs, called open-set semi-supervised image generation. OSSGAN provides decision clues to the discriminator on the basis of whether an unlabeled image belongs to one or none of the classes of interest. The results of experiments on Tiny ImageNet and ImageNet show notable improvements over supervised BigGAN and semi-supervised methods.
arXiv Detail & Related papers (2022-04-29T17:26:09Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled Images as Reference [90.5402652758316]
We propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net. It uses labeled information to guide the learning of unlabeled instances. It achieves competitive segmentation accuracy and significantly improves the mIoU by +7$%$ compared to previous approaches.
arXiv Detail & Related papers (2021-12-28T06:48:03Z)
Dealing with Distribution Mismatch in Semi-supervised Deep Learning for Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets. In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset. This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z)
Identifying Mislabeled Images in Supervised Learning Utilizing Autoencoder [0.0]
In image classification, incorrect labels may cause the classification model to be inaccurate as well. In this paper, I am going to apply unsupervised techniques to the training data before training the classification network. The algorithm can detect and remove above 67% of mislabeled data in the experimental dataset.
arXiv Detail & Related papers (2020-11-07T03:09:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.