Related papers: Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification

URL: http://arxiv.org/abs/2409.19751v1
Date: Sun, 29 Sep 2024 16:02:32 GMT
Title: Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification
Authors: Mohamed Abdelhamid, Abhyuday Desai,
Abstract summary: This study comprehensively evaluates three widely-used strategies for handling class imbalance. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold emerging as the most consistently effective technique.
Score: 0.8287206589886881
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Class imbalance in binary classification tasks remains a significant challenge in machine learning, often resulting in poor performance on minority classes. This study comprehensively evaluates three widely-used strategies for handling class imbalance: Synthetic Minority Over-sampling Technique (SMOTE), Class Weights tuning, and Decision Threshold Calibration. We compare these methods against a baseline scenario of no-intervention across 15 diverse machine learning models and 30 datasets from various domains, conducting a total of 9,000 experiments. Performance was primarily assessed using the F1-score, although our study also tracked results on additional 9 metrics including F2-score, precision, recall, Brier-score, PR-AUC, and AUC. Our results indicate that all three strategies generally outperform the baseline, with Decision Threshold Calibration emerging as the most consistently effective technique. However, we observed substantial variability in the best-performing method across datasets, highlighting the importance of testing multiple approaches for specific problems. This study provides valuable insights for practitioners dealing with imbalanced datasets and emphasizes the need for dataset-specific analysis in evaluating class imbalance handling techniques.

Related papers

Addressing Class Imbalance with Probabilistic Graphical Models and Variational Inference [10.457756074328664]
This study proposes a method for imbalanced data classification based on deep probabilistic graphical models (DPGMs) We introduce variational inference optimization probability modeling, which enables the model to adaptively adjust the representation ability of minority classes. We combine the adversarial learning mechanism to generate minority class samples in the latent space so that the model can better characterize the category boundary.
arXiv Detail & Related papers (2025-04-08T07:38:30Z)
Statistical Undersampling with Mutual Information and Support Points [4.118796935183671]
Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and support points optimization.
arXiv Detail & Related papers (2024-12-19T04:48:29Z)
Electroencephalogram Emotion Recognition via AUC Maximization [0.0]
Imbalanced datasets pose significant challenges in areas including neuroscience, cognitive science, and medical diagnostics. This study addresses the issue class imbalance, using the Liking' label in the DEAP dataset as an example.
arXiv Detail & Related papers (2024-08-16T19:08:27Z)
Bias Mitigating Few-Shot Class-Incremental Learning [17.185744533050116]
Few-shot class-incremental learning aims at recognizing novel classes continually with limited novel class samples. Recent methods somewhat alleviate the accuracy imbalance between base and incremental classes by fine-tuning the feature extractor in the incremental sessions. We propose a novel method to mitigate model bias of the FSCIL problem during training and inference processes.
arXiv Detail & Related papers (2024-02-01T10:37:41Z)
A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes. On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z)
Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem. We examine the performance of various debiasing methods across multiple tasks. We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z)
Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification. It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples. We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z)
An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification [0.0]
The prevalence of imbalance in real-world datasets has led to the creation of a multitude of strategies for the class imbalance issue. Standard classification algorithms tend to perform poorly when trained on imbalanced data. We present a comprehensive analysis of 26 popular sampling techniques to understand their effectiveness in dealing with imbalanced data.
arXiv Detail & Related papers (2022-08-25T03:45:34Z)
On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification. We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned. Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance. Sample re-weighting methods are popularly used to alleviate this data bias issue. We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z)
Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced Data with Label Noise [11.868507571027626]
In this paper, we propose a novel oversampling technique, a Multi-Class Combined Cleaning and Resampling algorithm. The proposed method utilizes an energy-based approach to modeling the regions suitable for oversampling, less affected by small disjuncts and outliers than SMOTE. It combines it with a simultaneous cleaning operation, the aim of which is to reduce the effect of overlapping class distributions on the performance of the learning algorithms.
arXiv Detail & Related papers (2020-04-07T13:59:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.