Investigating Group Distributionally Robust Optimization for Deep
Imbalanced Learning: A Case Study of Binary Tabular Data Classification
- URL: http://arxiv.org/abs/2303.02505v1
- Date: Sat, 4 Mar 2023 21:20:58 GMT
- Title: Investigating Group Distributionally Robust Optimization for Deep
Imbalanced Learning: A Case Study of Binary Tabular Data Classification
- Authors: Ismail. B. Mustapha, Shafaatunnur Hasan, Hatem S Y Nabbus, Mohamed
Mostafa Ali Montaser, Sunday Olusanya Olatunji, Siti Maryam Shamsuddin
- Abstract summary: Group distributionally robust optimization (gDRO) is investigated in this study for imbalance learning.
Experimental findings in comparison with empirical risk minimization (ERM) and classical imbalance methods reveal impressive performance of gDRO.
- Score: 0.44040106718326594
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the most studied machine learning challenges that recent studies have
shown the susceptibility of deep neural networks to is the class imbalance
problem. While concerted research efforts in this direction have been notable
in recent years, findings have shown that the canonical learning objective,
empirical risk minimization (ERM), is unable to achieve optimal imbalance
learning in deep neural networks given its bias to the majority class. An
alternative learning objective, group distributionally robust optimization
(gDRO), is investigated in this study for imbalance learning, focusing on
tabular imbalanced data as against image data that has dominated deep imbalance
learning research. Contrary to minimizing average per instance loss as in ERM,
gDRO seeks to minimize the worst group loss over the training data.
Experimental findings in comparison with ERM and classical imbalance methods
using four popularly used evaluation metrics in imbalance learning across
several benchmark imbalance binary tabular data of varying imbalance ratios
reveal impressive performance of gDRO, outperforming other compared methods in
terms of g-mean and roc-auc.
Related papers
- Synthetic Tabular Data Generation for Class Imbalance and Fairness: A Comparative Study [4.420073761023326]
Due to their data-driven nature, Machine Learning (ML) models are susceptible to bias inherited from data.
Class imbalance (in the classification target) and group imbalance (in protected attributes like sex or race) can undermine both ML utility and fairness.
This paper conducts a comparative analysis to address class and group imbalances using state-of-the-art models.
arXiv Detail & Related papers (2024-09-08T20:08:09Z) - How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization.
This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Bias Amplification Enhances Minority Group Performance [10.380812738348899]
We propose BAM, a novel two-stage training algorithm.
In the first stage, the model is trained using a bias amplification scheme via introducing a learnable auxiliary variable for each training sample.
In the second stage, we upweight the samples that the bias-amplified model misclassifies, and then continue training the same model on the reweighted dataset.
arXiv Detail & Related papers (2023-09-13T04:40:08Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks [0.1074267520911262]
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
arXiv Detail & Related papers (2022-09-01T07:42:16Z) - Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for
Imbalanced Classification [11.673344551762822]
Deep neural networks generally perform poorly with datasets that suffer from quantity imbalance and classification difficulty imbalance between different classes.
A phased progressive learning schedule was proposed for smoothly transferring the training emphasis from representation learning to upper classifier training.
Our code will be open source soon.
arXiv Detail & Related papers (2022-05-24T14:46:39Z) - Quadruplet Deep Metric Learning Model for Imbalanced Time-series Fault
Diagnosis [0.2538209532048866]
This paper analyzes how to improve the performance of imbalanced classification by adjusting the distance between classes and the distribution within a class.
A novel quadruplet data pair design considering imbalance class is proposed with reference to traditional deep metric learning.
The reasonable combination of quadruplet loss and softmax loss function can reduce the impact of imbalance.
arXiv Detail & Related papers (2021-07-08T11:56:41Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.