Related papers: A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation

URL: http://arxiv.org/abs/2304.02858v3
Date: Mon, 27 Nov 2023 03:15:34 GMT
Title: A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation
Authors: Azal Ahmad Khan, Omkar Chaudhari, Rohitash Chandra
Abstract summary: Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. In this paper, we evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems.
Score: 0.196629787330046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. Ensemble learning combines multiple models to obtain a robust model and has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, and the evaluation of different combinations would enable a better understanding and guidance for different application domains. In this paper, we present a computational study to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We present a general framework that evaluates 9 data augmentation and 9 ensemble learning methods for CI problems. Our objective is to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. We find that traditional data augmentation methods such as the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) are not only better in performance for selected CI problems, but also computationally less expensive than GANs. Our study is vital for the development of novel models for handling imbalanced datasets.

Related papers

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation [0.6215404942415159]
Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data.<n>The objective of this work is to deepen the investigation of cluster-based cross-validation strategies by analyzing the performance of different clustering algorithms.<n>A new cross-validation technique that combines Mini Batch K-Means with class stratification is proposed.
arXiv Detail & Related papers (2025-07-30T00:13:52Z)
Diversity Over Quantity: A Lesson From Few Shot Relation Classification [62.66895901654023]
We show that training on a diverse set of relations significantly enhances a model's ability to generalize to unseen relations. We introduce REBEL-FS, a new FSRC benchmark that incorporates an order of magnitude more relation types than existing datasets.
arXiv Detail & Related papers (2024-12-06T21:41:01Z)
Ensemble Methods for Sequence Classification with Hidden Markov Models [8.241486511994202]
We present a lightweight approach to sequence classification using Ensemble Methods for Hidden Markov Models (HMMs) HMMs offer significant advantages in scenarios with imbalanced or smaller datasets due to their simplicity, interpretability, and efficiency. Our ensemble-based scoring method enables the comparison of sequences of any length and improves performance on imbalanced datasets.
arXiv Detail & Related papers (2024-09-11T20:59:32Z)
Systematic Evaluation of Synthetic Data Augmentation for Multi-class NetFlow Traffic [2.5182419298876857]
Multi-class classification models can identify specific types of attacks, allowing for more targeted and effective incident responses. Recent advances suggest that generative models can assist in data augmentation, claiming to offer superior solutions for imbalanced datasets. Our experiments indicate that resampling methods for balancing training data do not reliably improve classification performance.
arXiv Detail & Related papers (2024-08-28T12:44:07Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions. We propose deep negative correlation classification (DNCC) DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z)
A Hybrid Approach for Binary Classification of Imbalanced Data [0.0]
We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC.
arXiv Detail & Related papers (2022-07-06T15:18:41Z)
Federated Learning Aggregation: New Robust Algorithms with Guarantees [63.96013144017572]
Federated learning has been recently proposed for distributed model training at the edge. This paper presents a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. We derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses.
arXiv Detail & Related papers (2022-05-22T16:37:53Z)
ASE: Anomaly Scoring Based Ensemble Learning for Imbalanced Datasets [3.214208422566496]
We come up with a bagging ensemble learning framework based on an anomaly detection scoring system. We test out that our ensemble learning model can dramatically improve performance of base estimators.
arXiv Detail & Related papers (2022-03-21T07:20:41Z)
Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels. When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new. In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Hybrid Ensemble optimized algorithm based on Genetic Programming for imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification. Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z)
Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers. Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.