Learning Multi-Layered GBDT Via Back Propagation
- URL: http://arxiv.org/abs/2109.11863v2
- Date: Mon, 27 Sep 2021 03:05:50 GMT
- Title: Learning Multi-Layered GBDT Via Back Propagation
- Authors: Zhendong Zhang
- Abstract summary: We propose a framework of learning multi-layered GBDT via back propagation (BP)
We approximate the gradient of GBDT based on linear regression.
Experiments show the effectiveness of the proposed method in terms of performance and representation ability.
- Score: 9.249235534786072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are able to learn multi-layered representation via back
propagation (BP). Although the gradient boosting decision tree (GBDT) is
effective for modeling tabular data, it is non-differentiable with respect to
its input, thus suffering from learning multi-layered representation. In this
paper, we propose a framework of learning multi-layered GBDT via BP. We
approximate the gradient of GBDT based on linear regression. Specifically, we
use linear regression to replace the constant value at each leaf ignoring the
contribution of individual samples to the tree structure. In this way, we
estimate the gradient for intermediate representations, which facilitates BP
for multi-layered GBDT. Experiments show the effectiveness of the proposed
method in terms of performance and representation ability. To the best of our
knowledge, this is the first work of optimizing multi-layered GBDT via BP. This
work provides a new possibility of exploring deep tree based learning and
combining GBDT with neural networks.
Related papers
- Vanilla Gradient Descent for Oblique Decision Trees [7.236325471627686]
We propose a novel encoding for (hard, oblique) DTs as Neural Networks (NNs)
Experiments show oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques.
We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs.
arXiv Detail & Related papers (2024-08-17T08:18:40Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - An In-depth Study of Stochastic Backpropagation [44.953669040828345]
We study Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks.
During backward propagation, SBP calculates gradients by only using a subset of feature maps to save the GPU memory and computational cost.
Experiments on image classification and object detection show that SBP can save up to 40% of GPU memory with less than 1% accuracy.
arXiv Detail & Related papers (2022-09-30T23:05:06Z) - Transfer Learning with Deep Tabular Models [66.67017691983182]
We show that upstream data gives tabular neural networks a decisive advantage over GBDT models.
We propose a realistic medical diagnosis benchmark for tabular transfer learning.
We propose a pseudo-feature method for cases where the upstream and downstream feature sets differ.
arXiv Detail & Related papers (2022-06-30T14:24:32Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Enhancing Transformers with Gradient Boosted Decision Trees for NLI
Fine-Tuning [7.906608953906889]
We introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network.
We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model.
arXiv Detail & Related papers (2021-05-08T22:31:51Z) - Towards Evaluating and Training Verifiably Robust Neural Networks [81.39994285743555]
We study the relationship between IBP and CROWN, and prove that CROWN is always tighter than IBP when choosing appropriate bounding lines.
We propose a relaxed version of CROWN, linear bound propagation (LBP), that can be used to verify large networks to obtain lower verified errors.
arXiv Detail & Related papers (2021-04-01T13:03:48Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.