Related papers: Vanilla Gradient Descent for Oblique Decision Trees

Vanilla Gradient Descent for Oblique Decision Trees

URL: http://arxiv.org/abs/2408.09135v3
Date: Tue, 15 Oct 2024 12:58:35 GMT
Title: Vanilla Gradient Descent for Oblique Decision Trees
Authors: Subrat Prasad Panda, Blaise Genest, Arvind Easwaran, Ponnuthurai Nagaratnam Suganthan,
Abstract summary: We propose a novel encoding for (hard, oblique) DTs as Neural Networks (NNs) Experiments show oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques. We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs.
Score: 7.236325471627686
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Decision Trees (DTs) constitute one of the major highly non-linear AI models, valued, e.g., for their efficiency on tabular data. Learning accurate DTs is, however, complicated, especially for oblique DTs, and does take a significant training time. Further, DTs suffer from overfitting, e.g., they proverbially "do not generalize" in regression tasks. Recently, some works proposed ways to make (oblique) DTs differentiable. This enables highly efficient gradient-descent algorithms to be used to learn DTs. It also enables generalizing capabilities by learning regressors at the leaves simultaneously with the decisions in the tree. Prior approaches to making DTs differentiable rely either on probabilistic approximations at the tree's internal nodes (soft DTs) or on approximations in gradient computation at the internal node (quantized gradient descent). In this work, we propose DTSemNet, a novel semantically equivalent and invertible encoding for (hard, oblique) DTs as Neural Networks (NNs), that uses standard vanilla gradient descent. Experiments across various classification and regression benchmarks show that oblique DTs learned using DTSemNet are more accurate than oblique DTs of similar size learned using state-of-the-art techniques. Further, DT training time is significantly reduced. We also experimentally demonstrate that DTSemNet can learn DT policies as efficiently as NN policies in the Reinforcement Learning (RL) setup with physical inputs (dimensions $\leq32$). The code is available at https://github.com/CPS-research-group/dtsemnet.

Related papers

Decision Trees That Remember: Gradient-Based Learning of Recurrent Decision Trees with Memory [1.4487264853431878]
We introduce ReMeDe Trees, a novel recurrent DT architecture that integrates an internal memory mechanism, similar to RNNs, to learn long-term dependencies in sequential data. Our model learns hard, axis-aligned decision rules for both output generation and state updates, optimizing them efficiently via gradient descent.
arXiv Detail & Related papers (2025-02-06T13:11:50Z)
Enhance Learning Efficiency of Oblique Decision Tree via Feature Concatenation [16.81813720905545]
We propose an enhanced ODT method with Feature Concatenation (textttFC-ODT) textttFC-ODT enables in-model feature transformation to transmit the projections along the decision paths. Experiments show that textttFC-ODT can outperform the other state-of-the-art decision trees with a limited tree depth.
arXiv Detail & Related papers (2025-02-01T15:49:18Z)
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later [76.66498833720411]
We introduce a differentiable version of $K$-nearest neighbors (KNN) originally designed to learn a linear projection to capture semantic similarities between instances. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures.
arXiv Detail & Related papers (2024-07-03T16:38:57Z)
Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs) Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix. We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z)
Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z)
GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent [10.27211960475599]
Decision Trees (DTs) are commonly used for many machine learning tasks. In this paper, we propose a novel approach to learn DTs using a greedy algorithm. We propose backpropagation with a straight-through operator on a dense DT representation, to jointly optimize all tree parameters.
arXiv Detail & Related papers (2023-05-05T13:24:35Z)
Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning [0.0]
Interpretability of AI models allows for user safety checks to build trust in these models. Decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. To learn compact trees, a Reinforcement Learning framework has been recently proposed to explore the space of DTs.
arXiv Detail & Related papers (2023-04-11T09:43:23Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models. Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency. We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z)
Quantized Training of Gradient Boosting Decision Trees [84.97123593657584]
We propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits. We observe up to 2$times$ speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets.
arXiv Detail & Related papers (2022-07-20T06:27:06Z)
Learning Multi-Layered GBDT Via Back Propagation [9.249235534786072]
We propose a framework of learning multi-layered GBDT via back propagation (BP) We approximate the gradient of GBDT based on linear regression. Experiments show the effectiveness of the proposed method in terms of performance and representation ability.
arXiv Detail & Related papers (2021-09-24T10:10:25Z)
Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning [7.906608953906889]
We introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model.
arXiv Detail & Related papers (2021-05-08T22:31:51Z)
Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.