Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification
- URL: http://arxiv.org/abs/2102.01284v1
- Date: Tue, 2 Feb 2021 03:48:55 GMT
- Title: Single Model Deep Learning on Imbalanced Small Datasets for Skin Lesion
Classification
- Authors: Peng Yao, Shuwei Shen, Mengjuan Xu, Peng Liu, Fan Zhang, Jinyu Xing,
Pengfei Shao, Benjamin Kaffenberger, and Ronald X. Xu
- Abstract summary: This paper proposes a novel data augmentation strategy for single model classification of skin lesions based on a small and imbalanced dataset.
Various DCNNs are trained on this dataset to show that the models with moderate complexity outperform the larger models.
By combining Modified RandAugment and Multi-weighted Focal Loss in a single DCNN model, we have achieved the classification accuracy comparable to those of multiple ensembling models on the ISIC 2018 challenge test dataset.
- Score: 5.642359877598896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep convolutional neural network (DCNN) models have been widely explored for
skin disease diagnosis and some of them have achieved the diagnostic outcomes
comparable or even superior to those of dermatologists. However, broad
implementation of DCNN in skin disease detection is hindered by small size and
data imbalance of the publically accessible skin lesion datasets. This paper
proposes a novel data augmentation strategy for single model classification of
skin lesions based on a small and imbalanced dataset. First, various DCNNs are
trained on this dataset to show that the models with moderate complexity
outperform the larger models. Second, regularization DropOut and DropBlock are
added to reduce overfitting and a Modified RandAugment augmentation strategy is
proposed to address the defects of sample underrepresentation in the small
dataset. Finally, a novel Multi-Weighted Focal Loss function is introduced to
overcome the challenge of uneven sample size and classification difficulty. By
combining Modified RandAugment and Multi-weighted Focal Loss in a single DCNN
model, we have achieved the classification accuracy comparable to those of
multiple ensembling models on the ISIC 2018 challenge test dataset. Our study
shows that this method is able to achieve a high classification performance at
a low cost of computational resources and inference time, potentially suitable
to implement in mobile devices for automated screening of skin lesions and many
other malignancies in low resource settings.
Related papers
- An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification [2.2940141855172036]
In molecular biology, there has been an explosion of data generated from multi-omics sequencing.
Traditional statistical methods face challenging tasks when dealing with such high dimensional data.
This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features.
arXiv Detail & Related papers (2024-05-16T01:45:55Z) - Iterative Online Image Synthesis via Diffusion Model for Imbalanced
Classification [29.730360798234294]
We introduce an Iterative Online Image Synthesis framework to address the class imbalance problem in medical image classification.
Our framework incorporates two key modules, namely Online Image Synthesis (OIS) and Accuracy Adaptive Sampling (AAS)
To evaluate the effectiveness of our proposed method in addressing imbalanced classification, we conduct experiments on the HAM10000 and APTOS datasets.
arXiv Detail & Related papers (2024-03-13T10:51:18Z) - MCRAGE: Synthetic Healthcare Data for Fairness [3.0089659534785853]
We propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE) to augment imbalanced datasets.
MCRAGE involves training a Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes.
We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes.
arXiv Detail & Related papers (2023-10-27T19:02:22Z) - Feature robustness and sex differences in medical imaging: a case study
in MRI-based Alzheimer's disease detection [1.7616042687330637]
We compare two classification schemes on the ADNI MRI dataset.
We do not find a strong dependence of model performance for male and female test subjects on the sex composition of the training dataset.
arXiv Detail & Related papers (2022-04-04T17:37:54Z) - Analyzing the Effects of Handling Data Imbalance on Learned Features
from Medical Images by Looking Into the Models [50.537859423741644]
Training a model on an imbalanced dataset can introduce unique challenges to the learning problem.
We look deeper into the internal units of neural networks to observe how handling data imbalance affects the learned features.
arXiv Detail & Related papers (2022-04-04T09:38:38Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - CS-AF: A Cost-sensitive Multi-classifier Active Fusion Framework for
Skin Lesion Classification [9.265557367859637]
Convolutional neural networks (CNNs) have achieved the state-of-the-art performance in skin lesion analysis.
We present CS-AF, a cost-sensitive multi-classifier active fusion framework for skin lesion classification.
arXiv Detail & Related papers (2020-04-25T05:48:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.