Highly Efficient Representation and Active Learning Framework for
Imbalanced Data and its Application to COVID-19 X-Ray Classification
- URL: http://arxiv.org/abs/2103.05109v2
- Date: Thu, 18 Mar 2021 21:26:35 GMT
- Title: Highly Efficient Representation and Active Learning Framework for
Imbalanced Data and its Application to COVID-19 X-Ray Classification
- Authors: Heng Hao, Sima Didari, Jae Oh Woo, Hankyu Moon, and Patrick Bangert
- Abstract summary: We propose a highly data-efficient classification and active learning framework for classifying chest X-rays.
It is based on (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process method.
We demonstrate that only $sim 10%$ of the labeled data is needed to reach the accuracy from training all available labels.
- Score: 0.7829352305480284
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a highly data-efficient classification and active learning
framework for classifying chest X-rays. It is based on (1) unsupervised
representation learning of a Convolutional Neural Network and (2) the Gaussian
Process method. The unsupervised representation learning employs
self-supervision that does not require class labels, and the learned features
are proven to achieve label-efficient classification. GP is a kernel-based
Bayesian approach that also leads to data-efficient predictions with the added
benefit of estimating each decision's uncertainty. Our novel framework combines
these two elements in sequence to achieve highly data and label efficient
classifications. Moreover, both elements are less sensitive to the prevalent
and challenging class imbalance issue, thanks to the (1) feature learned
without labels and (2) the Bayesian nature of GP. The GP-provided uncertainty
estimates enable active learning by ranking samples based on the uncertainty
and selectively labeling samples showing higher uncertainty. We apply this
novel combination to the data-deficient and severely imbalanced case of
COVID-19 chest X-ray classification. We demonstrate that only $\sim 10\%$ of
the labeled data is needed to reach the accuracy from training all available
labels. Its application to the COVID-19 data in a fully supervised
classification scenario shows that our model, with a generic ResNet backbone,
outperforms (COVID-19 case by 4\%) the state-of-the-art model with a highly
tuned architecture. Our model architecture and proposed framework are general
and straightforward to apply to a broader class of datasets, with expected
success.
Related papers
- Memory Consistency Guided Divide-and-Conquer Learning for Generalized
Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning.
We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL)
Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z) - Dynamic Conceptional Contrastive Learning for Generalized Category
Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data.
Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories.
One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data.
We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z) - NorMatch: Matching Normalizing Flows with Discriminative Classifiers for
Semi-Supervised Learning [8.749830466953584]
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data.
In this work we introduce a new framework for SSL named NorMatch.
We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
arXiv Detail & Related papers (2022-11-17T15:39:18Z) - 2nd Place Solution for ICCV 2021 VIPriors Image Classification
Challenge: An Attract-and-Repulse Learning Approach [41.346232387426944]
Convolutional neural networks (CNNs) have achieved significant success in image classification by utilizing large-scale datasets.
We propose Attract-and-Repulse, which consists of Contrastive Regularization (CR) to enrich the feature representations, Symmetric Cross Entropy (SCE) to balance the fitting for different classes.
Specifically, SCE and CR learn discriminative representations while alleviating over-fitting by the adaptive trade-off between the information of classes (attract) and instances (repulse)
arXiv Detail & Related papers (2022-06-13T13:54:33Z) - Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations.
We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z) - Boosting Active Learning via Improving Test Performance [35.9303900799961]
We show that selecting unlabeled data of higher gradient norm leads to a lower upper bound of test loss.
We propose two schemes, namely expected-gradnorm and entropy-gradnorm.
Our method achieves superior performance against the state-of-the-art.
arXiv Detail & Related papers (2021-12-10T17:25:14Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Unified Robust Training for Graph NeuralNetworks against Label Noise [12.014301020294154]
We propose a new framework, UnionNET, for learning with noisy labels on graphs under a semi-supervised setting.
Our approach provides a unified solution for robustly training GNNs and performing label correction simultaneously.
arXiv Detail & Related papers (2021-03-05T01:17:04Z) - CoMatch: Semi-supervised Learning with Contrastive Graph Regularization [86.84486065798735]
CoMatch is a new semi-supervised learning method that unifies dominant approaches.
It achieves state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2020-11-23T02:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.