A Novel Hybrid Sampling Framework for Imbalanced Learning
- URL: http://arxiv.org/abs/2208.09619v1
- Date: Sat, 20 Aug 2022 07:04:00 GMT
- Title: A Novel Hybrid Sampling Framework for Imbalanced Learning
- Authors: Asif Newaz, Farhan Shahriyar Haq
- Abstract summary: "SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques.
Rigorous experimentation has been conducted on 26 imbalanced datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Class imbalance is a frequently occurring scenario in classification tasks.
Learning from imbalanced data poses a major challenge, which has instigated a
lot of research in this area. Data preprocessing using sampling techniques is a
standard approach to deal with the imbalance present in the data. Since
standard classification algorithms do not perform well on imbalanced data, the
dataset needs to be adequately balanced before training. This can be
accomplished by oversampling the minority class or undersampling the majority
class. In this study, a novel hybrid sampling algorithm has been proposed. To
overcome the limitations of the sampling techniques while ensuring the quality
of the retained sampled dataset, a sophisticated framework has been developed
to properly combine three different sampling techniques. Neighborhood Cleaning
rule is first applied to reduce the imbalance. Random undersampling is then
strategically coupled with the SMOTE algorithm to obtain an optimal balance in
the dataset. This proposed hybrid methodology, termed "SMOTE-RUS-NC", has been
compared with other state-of-the-art sampling techniques. The strategy is
further incorporated into the ensemble learning framework to obtain a more
robust classification algorithm, termed "SRN-BRF". Rigorous experimentation has
been conducted on 26 imbalanced datasets with varying degrees of imbalance. In
virtually all datasets, the proposed two algorithms outperformed existing
sampling strategies, in many cases by a substantial margin. Especially in
highly imbalanced datasets where popular sampling techniques failed utterly,
they achieved unparalleled performance. The superior results obtained
demonstrate the efficacy of the proposed models and their potential to be
powerful sampling algorithms in imbalanced domain.
Related papers
- A Bilevel Optimization Framework for Imbalanced Data Classification [1.6385815610837167]
We propose a new undersampling approach that avoids the pitfalls of noise and overlap caused by synthetic data.
Instead of undersampling majority data randomly, our method undersamples datapoints based on their ability to improve model loss.
Using improved model loss as a proxy measurement for classification performance, our technique assesses a datapoint's impact on loss and rejects those unable to improve it.
arXiv Detail & Related papers (2024-10-15T01:17:23Z) - iBRF: Improved Balanced Random Forest Classifier [0.0]
Class imbalance poses a major challenge in different classification tasks.
We propose a modification to the Balanced Random Forest (BRF) classifier to enhance the prediction performance.
Our proposed hybrid sampling technique, when incorporated into the framework of the Random Forest classifier, achieves better prediction performance than other sampling techniques used in imbalanced classification tasks.
arXiv Detail & Related papers (2024-03-14T20:59:36Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Generalized Oversampling for Learning from Imbalanced datasets and
Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets.
We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates.
We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - A Novel Adaptive Minority Oversampling Technique for Improved
Classification in Data Imbalanced Scenarios [23.257891827728827]
Imbalance in the proportion of training samples belonging to different classes often poses performance degradation of conventional classifiers.
We propose a novel three step technique to address imbalanced data.
arXiv Detail & Related papers (2021-03-24T09:58:02Z) - Handling Imbalanced Data: A Case Study for Binary Class Problems [0.0]
The major issues in terms of solving for classification problems are the issues of Imbalanced data.
This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms.
We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
arXiv Detail & Related papers (2020-10-09T02:04:14Z) - Distribution Aligning Refinery of Pseudo-label for Imbalanced
Semi-supervised Learning [126.31716228319902]
We develop Distribution Aligning Refinery of Pseudo-label (DARP) algorithm.
We show that DARP is provably and efficiently compatible with state-of-the-art SSL schemes.
arXiv Detail & Related papers (2020-07-17T09:16:05Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.