Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities
- URL: http://arxiv.org/abs/2109.00889v1
- Date: Tue, 17 Aug 2021 00:35:43 GMT
- Title: Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities
- Authors: Saul Calderon-Ramirez, Shengxiang Yang, David Elizondo, Armaghan
Moemeni
- Abstract summary: Semi-supervised deep learning is an attractive alternative to large labelled datasets.
In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset.
This results in a distribution mismatch between the unlabelled and labelled datasets.
- Score: 0.6882042556551609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the context of the global coronavirus pandemic, different deep learning
solutions for infected subject detection using chest X-ray images have been
proposed. However, deep learning models usually need large labelled datasets to
be effective. Semi-supervised deep learning is an attractive alternative, where
unlabelled data is leveraged to improve the overall model's accuracy. However,
in real-world usage settings, an unlabelled dataset might present a different
distribution than the labelled dataset (i.e. the labelled dataset was sampled
from a target clinic and the unlabelled dataset from a source clinic). This
results in a distribution mismatch between the unlabelled and labelled
datasets. In this work, we assess the impact of the distribution mismatch
between the labelled and the unlabelled datasets, for a semi-supervised model
trained with chest X-ray images, for COVID-19 detection. Under strong
distribution mismatch conditions, we found an accuracy hit of almost 30\%,
suggesting that the unlabelled dataset distribution has a strong influence in
the behaviour of the model. Therefore, we propose a straightforward approach to
diminish the impact of such distribution mismatch. Our proposed method uses a
density approximation of the feature space. It is built upon the target dataset
to filter out the observations in the source unlabelled dataset that might harm
the accuracy of the semi-supervised model. It assumes that a small labelled
source dataset is available together with a larger source unlabelled dataset.
Our proposed method does not require any model training, it is simple and
computationally cheap. We compare our proposed method against two popular state
of the art out-of-distribution data detectors, which are also cheap and simple
to implement. In our tests, our method yielded accuracy gains of up to 32\%,
when compared to the previous state of the art methods.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - Restricted Generative Projection for One-Class Classification and
Anomaly Detection [31.173234437065464]
We learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution.
The simplicity is to ensure that we can sample from the distribution easily.
The compactness is to ensure that the decision boundary between normal data and abnormal data is clear.
The informativeness is to ensure that the transformed data preserve the important information of the original data.
arXiv Detail & Related papers (2023-07-09T04:59:10Z) - Improving Semi-supervised Deep Learning by using Automatic Thresholding
to Deal with Out of Distribution Data for COVID-19 Detection using Chest
X-ray Images [0.0]
We propose an automatic thresholding method to filter out-of-distribution data in the unlabeled dataset.
We test two simple automatic thresholding methods in the context of training a COVID-19 detector using chest X-ray images.
arXiv Detail & Related papers (2022-11-03T20:56:45Z) - Fake It Till You Make It: Near-Distribution Novelty Detection by
Score-Based Generative Models [54.182955830194445]
existing models either fail or face a dramatic drop under the so-called near-distribution" setting.
We propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data.
Our method improves the near-distribution novelty detection by 6% and passes the state-of-the-art by 1% to 5% across nine novelty detection benchmarks.
arXiv Detail & Related papers (2022-05-28T02:02:53Z) - Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey [1.5469452301122175]
Deep learning models rely on the abundance of labelled observations to train a prospective model.
It is expensive to gather labelled observations of data, making the usage of deep learning models not ideal.
In many situations different unlabelled data sources might be available.
This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets.
arXiv Detail & Related papers (2022-03-01T02:46:00Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Efficient remedies for outlier detection with variational autoencoders [8.80692072928023]
Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data.
We show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates.
We also show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection.
arXiv Detail & Related papers (2021-08-19T16:00:58Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using
X-ray Chest Images [4.1950566803514935]
We evaluate the performance of the semi-supervised deep learning architecture known as MixMatch.
A new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients.
arXiv Detail & Related papers (2020-08-19T15:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.