Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot
Encoding and Regularization
- URL: http://arxiv.org/abs/2304.13761v3
- Date: Thu, 11 May 2023 15:47:17 GMT
- Title: Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot
Encoding and Regularization
- Authors: Shijie Cui, Agus Sudjianto, Aijun Zhang, Runze Li
- Abstract summary: We apply one-hot encoding to convert a GBDT model into a linear framework, through encoding of each tree leaf to one dummy variable.
This allows for the use of linear regression techniques, plus a novel risk decomposition for assessing the robustness of a GBDT model.
It is demonstrated through numerical experiments that the proposed regularization approach can enhance the robustness of the one-hot-encoded GBDT models.
- Score: 10.942447664917061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient-boosted decision trees (GBDT) are widely used and highly effective
machine learning approach for tabular data modeling. However, their complex
structure may lead to low robustness against small covariate perturbation in
unseen data. In this study, we apply one-hot encoding to convert a GBDT model
into a linear framework, through encoding of each tree leaf to one dummy
variable. This allows for the use of linear regression techniques, plus a novel
risk decomposition for assessing the robustness of a GBDT model against
covariate perturbations. We propose to enhance the robustness of GBDT models by
refitting their linear regression forms with $L_1$ or $L_2$ regularization.
Theoretical results are obtained about the effect of regularization on the
model performance and robustness. It is demonstrated through numerical
experiments that the proposed regularization approach can enhance the
robustness of the one-hot-encoded GBDT models.
Related papers
- Bilinear Convolution Decomposition for Causal RL Interpretability [0.0]
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing.
This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed.
We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments.
arXiv Detail & Related papers (2024-12-01T19:32:04Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - TRAWL: Tensor Reduced and Approximated Weights for Large Language Models [11.064868044313855]
We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a technique that applies tensor decomposition across multiple weight matrices to effectively denoise LLMs by capturing global structural patterns.
Our experiments show that TRAWL improves model performance by up to 16% over baseline models on benchmark datasets, without requiring additional data, training, or fine-tuning.
arXiv Detail & Related papers (2024-06-25T04:01:32Z) - Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures [14.551812310439004]
We introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance.
Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass.
arXiv Detail & Related papers (2024-03-07T19:02:13Z) - Understanding the robustness difference between stochastic gradient
descent and adaptive gradient methods [11.895321856533934]
gradient descent (SGD) and adaptive gradient methods have been widely used in training deep neural networks.
We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations.
arXiv Detail & Related papers (2023-08-13T07:03:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Estimation of Bivariate Structural Causal Models by Variational Gaussian
Process Regression Under Likelihoods Parametrised by Normalising Flows [74.85071867225533]
Causal mechanisms can be described by structural causal models.
One major drawback of state-of-the-art artificial intelligence is its lack of explainability.
arXiv Detail & Related papers (2021-09-06T14:52:58Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Surrogate-based variational data assimilation for tidal modelling [0.0]
Data assimilation (DA) is widely used to combine physical knowledge and observations.
In a context of climate change, old calibrations can not necessarily be used for new scenarios.
This raises the question of DA computational cost.
Two methods are proposed to replace the complex model by a surrogate.
arXiv Detail & Related papers (2021-06-08T07:39:38Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - CASTLE: Regularization via Auxiliary Causal Graph Discovery [89.74800176981842]
We introduce Causal Structure Learning (CASTLE) regularization and propose to regularize a neural network by jointly learning the causal relationships between variables.
CASTLE efficiently reconstructs only the features in the causal DAG that have a causal neighbor, whereas reconstruction-based regularizers suboptimally reconstruct all input features.
arXiv Detail & Related papers (2020-09-28T09:49:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.