Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using
X-ray Chest Images
- URL: http://arxiv.org/abs/2008.08496v2
- Date: Thu, 20 Aug 2020 20:53:08 GMT
- Title: Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using
X-ray Chest Images
- Authors: Saul Calderon-Ramirez, Shengxiang-Yang, Armaghan Moemeni, David
Elizondo, Simon Colreavy-Donnelly, Luis Fernando Chavarria-Estrada, Miguel A.
Molina-Cabello
- Abstract summary: We evaluate the performance of the semi-supervised deep learning architecture known as MixMatch.
A new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients.
- Score: 4.1950566803514935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Corona Virus (COVID-19) is an internationalpandemic that has quickly
propagated throughout the world. The application of deep learning for image
classification of chest X-ray images of Covid-19 patients, could become a novel
pre-diagnostic detection methodology. However, deep learning architectures
require large labelled datasets. This is often a limitation when the subject of
research is relatively new as in the case of the virus outbreak, where dealing
with small labelled datasets is a challenge. Moreover, in the context of a new
highly infectious disease, the datasets are also highly imbalanced,with few
observations from positive cases of the new disease. In this work we evaluate
the performance of the semi-supervised deep learning architecture known as
MixMatch using a very limited number of labelled observations and highly
imbalanced labelled dataset. We propose a simple approach for correcting data
imbalance, re-weight each observationin the loss function, giving a higher
weight to the observationscorresponding to the under-represented class. For
unlabelled observations, we propose the usage of the pseudo and augmentedlabels
calculated by MixMatch to choose the appropriate weight. The MixMatch method
combined with the proposed pseudo-label based balance correction improved
classification accuracy by up to 10%, with respect to the non balanced MixMatch
algorithm, with statistical significance. We tested our proposed approach with
several available datasets using 10, 15 and 20 labelledobservations.
Additionally, a new dataset is included among thetested datasets, composed of
chest X-ray images of Costa Rican adult patients
Related papers
- A Semi-Supervised Algorithm for Improving the Consistency of
Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder
Classification [4.431270735024064]
Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19.
Many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset.
The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings.
This work uses a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification.
arXiv Detail & Related papers (2022-09-09T15:44:26Z) - Information Gain Sampling for Active Learning in Medical Image
Classification [3.1619162190378787]
This work presents an information-theoretic active learning framework that guides the optimal selection of images from the unlabelled pool to be labeled.
Experiments are performed on two different medical image classification datasets.
arXiv Detail & Related papers (2022-08-01T16:25:53Z) - Self-Supervised Learning as a Means To Reduce the Need for Labeled Data
in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies.
We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z) - Weakly-supervised Generative Adversarial Networks for medical image
classification [1.479639149658596]
We propose a novel medical image classification algorithm called Weakly-Supervised Generative Adversarial Networks (WSGAN)
WSGAN only uses a small number of real images without labels to generate fake images or mask images to enlarge the sample size of the training set.
We show that WSGAN can obtain relatively high learning performance by using few labeled and unlabeled data.
arXiv Detail & Related papers (2021-11-29T15:38:48Z) - Cross-Site Severity Assessment of COVID-19 from CT Images via Domain
Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event.
To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites.
This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z) - Dealing with Distribution Mismatch in Semi-supervised Deep Learning for
Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature
Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets.
In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset.
This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - RCoNet: Deformable Mutual Information Maximization and High-order
Uncertainty-aware Learning for Robust COVID-19 Detection [12.790651338952005]
The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world.
Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment.
We propose a novel deep network named em RCoNet$k_s$ for robust COVID-19 detection which employs em Deformable Mutual Information Maximization (DeIM), em Mixed High-order Moment Feature (MHMF) and em Multi-
arXiv Detail & Related papers (2021-02-22T15:13:42Z) - Improved Slice-wise Tumour Detection in Brain MRIs by Computing
Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods.
We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder.
We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z) - Multi-label Thoracic Disease Image Classification with Cross-Attention
Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images.
We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.