Related papers: Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images

Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images

URL: http://arxiv.org/abs/2008.08496v2
Date: Thu, 20 Aug 2020 20:53:08 GMT
Title: Correcting Data Imbalance for Semi-Supervised Covid-19 Detection Using X-ray Chest Images
Authors: Saul Calderon-Ramirez, Shengxiang-Yang, Armaghan Moemeni, David Elizondo, Simon Colreavy-Donnelly, Luis Fernando Chavarria-Estrada, Miguel A. Molina-Cabello
Abstract summary: We evaluate the performance of the semi-supervised deep learning architecture known as MixMatch. A new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients.
Score: 4.1950566803514935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Corona Virus (COVID-19) is an internationalpandemic that has quickly propagated throughout the world. The application of deep learning for image classification of chest X-ray images of Covid-19 patients, could become a novel pre-diagnostic detection methodology. However, deep learning architectures require large labelled datasets. This is often a limitation when the subject of research is relatively new as in the case of the virus outbreak, where dealing with small labelled datasets is a challenge. Moreover, in the context of a new highly infectious disease, the datasets are also highly imbalanced,with few observations from positive cases of the new disease. In this work we evaluate the performance of the semi-supervised deep learning architecture known as MixMatch using a very limited number of labelled observations and highly imbalanced labelled dataset. We propose a simple approach for correcting data imbalance, re-weight each observationin the loss function, giving a higher weight to the observationscorresponding to the under-represented class. For unlabelled observations, we propose the usage of the pseudo and augmentedlabels calculated by MixMatch to choose the appropriate weight. The MixMatch method combined with the proposed pseudo-label based balance correction improved classification accuracy by up to 10%, with respect to the non balanced MixMatch algorithm, with statistical significance. We tested our proposed approach with several available datasets using 10, 15 and 20 labelledobservations. Additionally, a new dataset is included among thetested datasets, composed of chest X-ray images of Costa Rican adult patients

Related papers

Confident Pseudo-labeled Diffusion Augmentation for Canine Cardiomegaly Detection [7.9471205712560264]
Canine cardiomegaly, marked by an enlarged heart, poses serious health risks if undetected. Current detection models often rely on small, poorly annotated datasets. We propose a Confident Pseudo-labeled Diffusion Augmentation model for identifying canine cardiomegaly.
arXiv Detail & Related papers (2025-01-13T18:10:19Z)
A Semi-Supervised Algorithm for Improving the Consistency of Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder Classification [4.431270735024064]
Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings. This work uses a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification.
arXiv Detail & Related papers (2022-09-09T15:44:26Z)
Information Gain Sampling for Active Learning in Medical Image Classification [3.1619162190378787]
This work presents an information-theoretic active learning framework that guides the optimal selection of images from the unlabelled pool to be labeled. Experiments are performed on two different medical image classification datasets.
arXiv Detail & Related papers (2022-08-01T16:25:53Z)
Self-Supervised Learning as a Means To Reduce the Need for Labeled Data in Medical Image Analysis [64.4093648042484]
We use a dataset of chest X-ray images with bounding box labels for 13 different classes of anomalies. We show that it is possible to achieve similar performance to a fully supervised model in terms of mean average precision and accuracy with only 60% of the labeled data.
arXiv Detail & Related papers (2022-06-01T09:20:30Z)
Weakly-supervised Generative Adversarial Networks for medical image classification [1.479639149658596]
We propose a novel medical image classification algorithm called Weakly-Supervised Generative Adversarial Networks (WSGAN) WSGAN only uses a small number of real images without labels to generate fake images or mask images to enlarge the sample size of the training set. We show that WSGAN can obtain relatively high learning performance by using few labeled and unlabeled data.
arXiv Detail & Related papers (2021-11-29T15:38:48Z)
Cross-Site Severity Assessment of COVID-19 from CT Images via Domain Adaptation [64.59521853145368]
Early and accurate severity assessment of Coronavirus disease 2019 (COVID-19) based on computed tomography (CT) images offers a great help to the estimation of intensive care unit event. To augment the labeled data and improve the generalization ability of the classification model, it is necessary to aggregate data from multiple sites. This task faces several challenges including class imbalance between mild and severe infections, domain distribution discrepancy between sites, and presence of heterogeneous features.
arXiv Detail & Related papers (2021-09-08T07:56:51Z)
Dealing with Distribution Mismatch in Semi-supervised Deep Learning for Covid-19 Detection Using Chest X-ray Images: A Novel Approach Using Feature Densities [0.6882042556551609]
Semi-supervised deep learning is an attractive alternative to large labelled datasets. In real-world usage settings, an unlabelled dataset might present a different distribution than the labelled dataset. This results in a distribution mismatch between the unlabelled and labelled datasets.
arXiv Detail & Related papers (2021-08-17T00:35:43Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
RCoNet: Deformable Mutual Information Maximization and High-order Uncertainty-aware Learning for Robust COVID-19 Detection [12.790651338952005]
The novel 2019 Coronavirus (COVID-19) infection has spread world widely and is currently a major healthcare challenge around the world. Due to faster imaging time and considerably lower cost than CT, detecting COVID-19 in chest X-ray (CXR) images is preferred for efficient diagnosis, assessment and treatment. We propose a novel deep network named em RCoNet$k_s$ for robust COVID-19 detection which employs em Deformable Mutual Information Maximization (DeIM), em Mixed High-order Moment Feature (MHMF) and em Multi-
arXiv Detail & Related papers (2021-02-22T15:13:42Z)
Improved Slice-wise Tumour Detection in Brain MRIs by Computing Dissimilarities between Latent Representations [68.8204255655161]
Anomaly detection for Magnetic Resonance Images (MRIs) can be solved with unsupervised methods. We have proposed a slice-wise semi-supervised method for tumour detection based on the computation of a dissimilarity function in the latent space of a Variational AutoEncoder. We show that by training the models on higher resolution images and by improving the quality of the reconstructions, we obtain results which are comparable with different baselines.
arXiv Detail & Related papers (2020-07-24T14:02:09Z)
Multi-label Thoracic Disease Image Classification with Cross-Attention Networks [65.37531731899837]
We propose a novel scheme of Cross-Attention Networks (CAN) for automated thoracic disease classification from chest x-ray images. We also design a new loss function that beyond cross-entropy loss to help cross-attention process and is able to overcome the imbalance between classes and easy-dominated samples within each class.
arXiv Detail & Related papers (2020-07-21T14:37:00Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.