Related papers: Highly Efficient Representation and Active Learning Framework for Imbalanced Data and its Application to COVID-19 X-Ray Classification

Highly Efficient Representation and Active Learning Framework for Imbalanced Data and its Application to COVID-19 X-Ray Classification

URL: http://arxiv.org/abs/2103.05109v2
Date: Thu, 18 Mar 2021 21:26:35 GMT
Title: Highly Efficient Representation and Active Learning Framework for Imbalanced Data and its Application to COVID-19 X-Ray Classification
Authors: Heng Hao, Sima Didari, Jae Oh Woo, Hankyu Moon, and Patrick Bangert
Abstract summary: We propose a highly data-efficient classification and active learning framework for classifying chest X-rays. It is based on (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process method. We demonstrate that only $sim 10%$ of the labeled data is needed to reach the accuracy from training all available labels.
Score: 0.7829352305480284
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a highly data-efficient classification and active learning framework for classifying chest X-rays. It is based on (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process method. The unsupervised representation learning employs self-supervision that does not require class labels, and the learned features are proven to achieve label-efficient classification. GP is a kernel-based Bayesian approach that also leads to data-efficient predictions with the added benefit of estimating each decision's uncertainty. Our novel framework combines these two elements in sequence to achieve highly data and label efficient classifications. Moreover, both elements are less sensitive to the prevalent and challenging class imbalance issue, thanks to the (1) feature learned without labels and (2) the Bayesian nature of GP. The GP-provided uncertainty estimates enable active learning by ranking samples based on the uncertainty and selectively labeling samples showing higher uncertainty. We apply this novel combination to the data-deficient and severely imbalanced case of COVID-19 chest X-ray classification. We demonstrate that only $\sim 10\%$ of the labeled data is needed to reach the accuracy from training all available labels. Its application to the COVID-19 data in a fully supervised classification scenario shows that our model, with a generic ResNet backbone, outperforms (COVID-19 case by 4\%) the state-of-the-art model with a highly tuned architecture. Our model architecture and proposed framework are general and straightforward to apply to a broader class of datasets, with expected success.

Related papers

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery [56.172872410834664]
Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning. We propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL) Our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition.
arXiv Detail & Related papers (2024-01-24T09:39:45Z)
Dynamic Conceptional Contrastive Learning for Generalized Category Discovery [76.82327473338734]
Generalized category discovery (GCD) aims to automatically cluster partially labeled data. Unlabeled data contain instances that are not only from known categories of the labeled data but also from novel categories. One effective way for GCD is applying self-supervised learning to learn discriminate representation for unlabeled data. We propose a Dynamic Conceptional Contrastive Learning framework, which can effectively improve clustering accuracy.
arXiv Detail & Related papers (2023-03-30T14:04:39Z)
NorMatch: Matching Normalizing Flows with Discriminative Classifiers for Semi-Supervised Learning [8.749830466953584]
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. In this work we introduce a new framework for SSL named NorMatch. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
arXiv Detail & Related papers (2022-11-17T15:39:18Z)
2nd Place Solution for ICCV 2021 VIPriors Image Classification Challenge: An Attract-and-Repulse Learning Approach [41.346232387426944]
Convolutional neural networks (CNNs) have achieved significant success in image classification by utilizing large-scale datasets. We propose Attract-and-Repulse, which consists of Contrastive Regularization (CR) to enrich the feature representations, Symmetric Cross Entropy (SCE) to balance the fitting for different classes. Specifically, SCE and CR learn discriminative representations while alleviating over-fitting by the adaptive trade-off between the information of classes (attract) and instances (repulse)
arXiv Detail & Related papers (2022-06-13T13:54:33Z)
Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations. We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z)
Boosting Active Learning via Improving Test Performance [35.9303900799961]
We show that selecting unlabeled data of higher gradient norm leads to a lower upper bound of test loss. We propose two schemes, namely expected-gradnorm and entropy-gradnorm. Our method achieves superior performance against the state-of-the-art.
arXiv Detail & Related papers (2021-12-10T17:25:14Z)
Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels. When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new. In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Unified Robust Training for Graph NeuralNetworks against Label Noise [12.014301020294154]
We propose a new framework, UnionNET, for learning with noisy labels on graphs under a semi-supervised setting. Our approach provides a unified solution for robustly training GNNs and performing label correction simultaneously.
arXiv Detail & Related papers (2021-03-05T01:17:04Z)
CoMatch: Semi-supervised Learning with Contrastive Graph Regularization [86.84486065798735]
CoMatch is a new semi-supervised learning method that unifies dominant approaches. It achieves state-of-the-art performance on multiple datasets.
arXiv Detail & Related papers (2020-11-23T02:54:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.