Feature Importance in Gradient Boosting Trees with Cross-Validation
Feature Selection
- URL: http://arxiv.org/abs/2109.05468v1
- Date: Sun, 12 Sep 2021 09:32:43 GMT
- Title: Feature Importance in Gradient Boosting Trees with Cross-Validation
Feature Selection
- Authors: Afek Ilay Adler and Amichai Painsky
- Abstract summary: We study the effect of biased base learners on Gradient Boosting Machines (GBM) feature importance (FI) measures.
By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost.
We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.
- Score: 11.295032417617454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular
data, which produce state of the art results in many prediction tasks. Despite
its popularity, the GBM framework suffers from a fundamental flaw in its base
learners. Specifically, most implementations utilize decision trees that are
typically biased towards categorical variables with large cardinalities. The
effect of this bias was extensively studied over the years, mostly in terms of
predictive performance. In this work, we extend the scope and study the effect
of biased base learners on GBM feature importance (FI) measures. We show that
although these implementation demonstrate highly competitive predictive
performance, they still, surprisingly, suffer from bias in FI. By utilizing
cross-validated (CV) unbiased base learners, we fix this flaw at a relatively
low computational cost. We demonstrate the suggested framework in a variety of
synthetic and real-world setups, showing a significant improvement in all GBM
FI measures while maintaining relatively the same level of prediction accuracy.
Related papers
- Federated Class-Incremental Learning with Hierarchical Generative Prototypes [10.532838477096055]
Federated Learning (FL) aims at unburdening the training of deep models by distributing computation across multiple devices (clients)
Our proposal constrains both biases in the last layer by efficiently finetuning a pre-trained backbone using learnable prompts.
Our method significantly improves the current State Of The Art, providing an average increase of +7.8% in accuracy.
arXiv Detail & Related papers (2024-06-04T16:12:27Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - Unbiased Gradient Boosting Decision Tree with Unbiased Feature
Importance [6.700461065769045]
Split finding algorithm of Gradient Boosting Decision Tree (GBDT) has been criticized for its bias towards features with a large number of potential splits.
We provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split.
We propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples.
arXiv Detail & Related papers (2023-05-18T04:17:46Z) - Variational Boosted Soft Trees [13.956254007901675]
Gradient boosting machines (GBMs) based on decision trees consistently demonstrate state-of-the-art results on regression and classification tasks.
We propose to implement Bayesian GBMs using variational inference with soft decision trees.
arXiv Detail & Related papers (2023-02-21T14:51:08Z) - Relieving Long-tailed Instance Segmentation via Pairwise Class Balance [85.53585498649252]
Long-tailed instance segmentation is a challenging task due to the extreme imbalance of training samples among classes.
It causes severe biases of the head classes (with majority samples) against the tailed ones.
We propose a novel Pairwise Class Balance (PCB) method, built upon a confusion matrix which is updated during training to accumulate the ongoing prediction preferences.
arXiv Detail & Related papers (2022-01-08T07:48:36Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.