Vanilla Gradient Descent for Oblique Decision Trees
- URL: http://arxiv.org/abs/2408.09135v3
- Date: Tue, 15 Oct 2024 12:58:35 GMT
- Title: Vanilla Gradient Descent for Oblique Decision Trees
- Authors: Subrat Prasad Panda, Blaise Genest, Arvind Easwaran, Ponnuthurai Nagaratnam Suganthan,
- Abstract summary: We propose a novel encoding for (hard, oblique) DTs as Neural Networks (NNs)
Experiments show oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques.
We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs.
- Score: 7.236325471627686
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Decision Trees (DTs) constitute one of the major highly non-linear AI models, valued, e.g., for their efficiency on tabular data. Learning accurate DTs is, however, complicated, especially for oblique DTs, and does take a significant training time. Further, DTs suffer from overfitting, e.g., they proverbially "do not generalize" in regression tasks. Recently, some works proposed ways to make (oblique) DTs differentiable. This enables highly efficient gradient-descent algorithms to be used to learn DTs. It also enables generalizing capabilities by learning regressors at the leaves simultaneously with the decisions in the tree. Prior approaches to making DTs differentiable rely either on probabilistic approximations at the tree's internal nodes (soft DTs) or on approximations in gradient computation at the internal node (quantized gradient descent). In this work, we propose DTSemNet, a novel semantically equivalent and invertible encoding for (hard, oblique) DTs as Neural Networks (NNs), that uses standard vanilla gradient descent. Experiments across various classification and regression benchmarks show that oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques. Further, DT training time is significantly reduced. We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs (dimensions $\leq32$). The code is available at https://github.com/CPS-research-group/dtsemnet.
Related papers
- Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs)
Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix.
We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent [10.27211960475599]
Decision Trees (DTs) are commonly used for many machine learning tasks.
In this paper, we propose a novel approach to learn DTs using a greedy algorithm.
We propose backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters.
arXiv Detail & Related papers (2023-05-05T13:24:35Z) - Optimal Interpretability-Performance Trade-off of Classification Trees
with Black-Box Reinforcement Learning [0.0]
Interpretability of AI models allows for user safety checks to build trust in these models.
Decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data.
To learn compact trees, a Reinforcement Learning framework has been recently proposed to explore the space of DTs.
arXiv Detail & Related papers (2023-04-11T09:43:23Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Quantized Training of Gradient Boosting Decision Trees [84.97123593657584]
We propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm.
With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits.
We observe up to 2$times$ speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets.
arXiv Detail & Related papers (2022-07-20T06:27:06Z) - Learning Multi-Layered GBDT Via Back Propagation [9.249235534786072]
We propose a framework of learning multi-layered GBDT via back propagation (BP)
We approximate the gradient of GBDT based on linear regression.
Experiments show the effectiveness of the proposed method in terms of performance and representation ability.
arXiv Detail & Related papers (2021-09-24T10:10:25Z) - Enhancing Transformers with Gradient Boosted Decision Trees for NLI
Fine-Tuning [7.906608953906889]
We introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network.
We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model.
arXiv Detail & Related papers (2021-05-08T22:31:51Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.