A review of ensemble learning and data augmentation models for class
imbalanced problems: combination, implementation and evaluation
- URL: http://arxiv.org/abs/2304.02858v3
- Date: Mon, 27 Nov 2023 03:15:34 GMT
- Title: A review of ensemble learning and data augmentation models for class
imbalanced problems: combination, implementation and evaluation
- Authors: Azal Ahmad Khan, Omkar Chaudhari, Rohitash Chandra
- Abstract summary: Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other.
In this paper, we evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems.
- Score: 0.196629787330046
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class imbalance (CI) in classification problems arises when the number of
observations belonging to one class is lower than the other. Ensemble learning
combines multiple models to obtain a robust model and has been prominently used
with data augmentation methods to address class imbalance problems. In the last
decade, a number of strategies have been added to enhance ensemble learning and
data augmentation methods, along with new methods such as generative
adversarial networks (GANs). A combination of these has been applied in many
studies, and the evaluation of different combinations would enable a better
understanding and guidance for different application domains. In this paper, we
present a computational study to evaluate data augmentation and ensemble
learning methods used to address prominent benchmark CI problems. We present a
general framework that evaluates 9 data augmentation and 9 ensemble learning
methods for CI problems. Our objective is to identify the most effective
combination for improving classification performance on imbalanced datasets.
The results indicate that combinations of data augmentation methods with
ensemble learning can significantly improve classification performance on
imbalanced datasets. We find that traditional data augmentation methods such as
the synthetic minority oversampling technique (SMOTE) and random oversampling
(ROS) are not only better in performance for selected CI problems, but also
computationally less expensive than GANs. Our study is vital for the
development of novel models for handling imbalanced datasets.
Related papers
- Ensemble Methods for Sequence Classification with Hidden Markov Models [8.241486511994202]
We present a lightweight approach to sequence classification using Ensemble Methods for Hidden Markov Models (HMMs)
HMMs offer significant advantages in scenarios with imbalanced or smaller datasets due to their simplicity, interpretability, and efficiency.
Our ensemble-based scoring method enables the comparison of sequences of any length and improves performance on imbalanced datasets.
arXiv Detail & Related papers (2024-09-11T20:59:32Z) - Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic [2.5182419298876857]
Multi-class classification models can identify specific types of attacks, allowing for more targeted and effective incident responses.
Recent advances suggest that generative models can assist in data augmentation, claiming to offer superior solutions for imbalanced datasets.
Our experiments indicate that resampling methods for balancing training data do not reliably improve classification performance.
arXiv Detail & Related papers (2024-08-28T12:44:07Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - A Hybrid Approach for Binary Classification of Imbalanced Data [0.0]
We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning.
We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC.
arXiv Detail & Related papers (2022-07-06T15:18:41Z) - Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge.
This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework.
We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z) - ASE: Anomaly Scoring Based Ensemble Learning for Imbalanced Datasets [3.214208422566496]
We come up with a bagging ensemble learning framework based on an anomaly detection scoring system.
We test out that our ensemble learning model can dramatically improve performance of base estimators.
arXiv Detail & Related papers (2022-03-21T07:20:41Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.