Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions
- URL: http://arxiv.org/abs/2407.14381v1
- Date: Fri, 19 Jul 2024 15:10:46 GMT
- Title: Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions
- Authors: Jiaqi Luo, Yuan Yuan, Shixin Xu,
- Abstract summary: This paper presents the first comprehensive study on adapting class-balanced loss functions to three Gradient Boosting Decision Trees (GBDT) algorithms.
We conduct extensive experiments on multiple datasets to evaluate the impact of class-balanced losses on different GBDT models.
Our results demonstrate the potential of class-balanced loss functions to enhance GBDT performance on imbalanced datasets.
- Score: 3.559225731091162
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Class imbalance remains a significant challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models have proven highly effective for such tasks, their performance can be compromised when dealing with imbalanced datasets. This paper presents the first comprehensive study on adapting class-balanced loss functions to three GBDT algorithms across various tabular classification tasks, including binary, multi-class, and multi-label classification. We conduct extensive experiments on multiple datasets to evaluate the impact of class-balanced losses on different GBDT models, establishing a valuable benchmark. Our results demonstrate the potential of class-balanced loss functions to enhance GBDT performance on imbalanced datasets, offering a robust approach for practitioners facing class imbalance challenges in real-world applications. Additionally, we introduce a Python package that facilitates the integration of class-balanced loss functions into GBDT workflows, making these advanced techniques accessible to a wider audience.
Related papers
- Gradient Reweighting: Towards Imbalanced Class-Incremental Learning [8.438092346233054]
Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data.
A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution.
We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL.
arXiv Detail & Related papers (2024-02-28T18:08:03Z) - Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions [3.06506506650274]
Training reliable classifiers under severe class imbalance is a challenging problem in computer vision.
Recent work has proposed techniques that mitigate the effects of training under imbalance by modifying the loss functions or optimization methods.
We propose training over a family of loss functions, instead of a single loss function.
arXiv Detail & Related papers (2024-02-08T04:31:21Z) - Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - Robust-GBDT: GBDT with Nonconvex Loss for Tabular Classification in the Presence of Label Noise and Class Imbalance [0.40964539027092917]
Robust-GBDT is a groundbreaking approach that combines the resilience of non loss functions against label noise.
It significantly enhances capabilities, particularly in noisy and imbalanced datasets.
It paves the way for a new era of robust and precise classification across diverse real-world applications.
arXiv Detail & Related papers (2023-10-08T08:28:40Z) - Class-Imbalanced Graph Learning without Class Rebalancing [62.1368829847041]
Class imbalance is prevalent in real-world node classification tasks and poses great challenges for graph learning models.
In this work, we approach the root cause of class-imbalance bias from an topological paradigm.
We devise a lightweight topological augmentation framework BAT to mitigate the class-imbalance bias without class rebalancing.
arXiv Detail & Related papers (2023-08-27T19:01:29Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Mitigating Dataset Imbalance via Joint Generation and Classification [17.57577266707809]
Supervised deep learning methods are enjoying enormous success in many practical applications of computer vision.
The marked performance degradation to biases and imbalanced data questions the reliability of these methods.
We introduce a joint dataset repairment strategy by combining a neural network classifier with Generative Adversarial Networks (GAN)
We show that the combined training helps to improve the robustness of both the classifier and the GAN against severe class imbalance.
arXiv Detail & Related papers (2020-08-12T18:40:38Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.