Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks
- URL: http://arxiv.org/abs/2209.00653v1
- Date: Thu, 1 Sep 2022 07:42:16 GMT
- Title: Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks
- Authors: Javad Hasannataj Joloudari, Abdolreza Marefat and Mohammad Ali
Nematollahi
- Abstract summary: Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
- Score: 0.1074267520911262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models
for achieving satisfactory results. ID is the occurrence of a situation where
the quantity of the samples belonging to one class outnumbers that of the other
by a wide margin, making such models learning process biased towards the
majority class. In recent years, to address this issue, several solutions have
been put forward, which opt for either synthetically generating new data for
the minority class or reducing the number of majority classes for balancing the
data. Hence, in this paper, we investigate the effectiveness of methods based
on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), mixed
with a variety of well-known imbalanced data solutions meaning oversampling and
undersampling. To evaluate our methods, we have used KEEL, breast cancer, and
Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted
our experiments 100 times with randomly shuffled data distributions. The
classification results demonstrate that the mixed Synthetic Minority
Oversampling Technique (SMOTE)-Normalization-CNN outperforms different
methodologies achieving 99.08% accuracy on the 24 imbalanced datasets.
Therefore, the proposed mixed model can be applied to imbalanced binary
classification problems on other real datasets.
Related papers
- Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - Skew Probabilistic Neural Networks for Learning from Imbalanced Data [3.7892198600060945]
This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel.
We show that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.
arXiv Detail & Related papers (2023-12-10T13:12:55Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - A Novel Hybrid Sampling Framework for Imbalanced Learning [0.0]
"SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques.
Rigorous experimentation has been conducted on 26 imbalanced datasets.
arXiv Detail & Related papers (2022-08-20T07:04:00Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - Effect of Balancing Data Using Synthetic Data on the Performance of
Machine Learning Classifiers for Intrusion Detection in Computer Networks [3.233545237942899]
Researchers in academia and industry used machine learning (ML) techniques to design and implement Intrusion Detection Systems (IDSes) for computer networks.
In many of the datasets used in such systems, data are imbalanced (i.e., not all classes have equal amount of samples)
We show that training ML models on dataset balanced with synthetic samples generated by CTGAN increased prediction accuracy by up to $8%$.
arXiv Detail & Related papers (2022-04-01T00:25:11Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.