Learning Binary Decision Trees by Argmin Differentiation
- URL: http://arxiv.org/abs/2010.04627v2
- Date: Mon, 14 Jun 2021 14:35:17 GMT
- Title: Learning Binary Decision Trees by Argmin Differentiation
- Authors: Valentina Zantedeschi, Matt J. Kusner, Vlad Niculae
- Abstract summary: We learn binary decision trees that partition data for some downstream task.
We do so by relaxing a mixed-integer program for the discrete parameters.
We derive customized algorithms to efficiently compute the forward and backward passes.
- Score: 34.9154848754842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of learning binary decision trees that partition data
for some downstream task. We propose to learn discrete parameters (i.e., for
tree traversals and node pruning) and continuous parameters (i.e., for tree
split functions and prediction functions) simultaneously using argmin
differentiation. We do so by sparsely relaxing a mixed-integer program for the
discrete parameters, to allow gradients to pass through the program to
continuous parameters. We derive customized algorithms to efficiently compute
the forward and backward passes. This means that our tree learning procedure
can be used as an (implicit) layer in arbitrary deep networks, and can be
optimized with arbitrary loss functions. We demonstrate that our approach
produces binary trees that are competitive with existing single tree and
ensemble approaches, in both supervised and unsupervised settings. Further,
apart from greedy approaches (which do not have competitive accuracies), our
method is faster to train than all other tree-learning baselines we compare
with. The code for reproducing the results is available at
https://github.com/vzantedeschi/LatentTrees.
Related papers
- Terminating Differentiable Tree Experts [77.2443883991608]
We propose a neuro-symbolic Differentiable Tree Machine that learns tree operations using a combination of transformers and Representation Products.
We first remove a series of different transformer layers that are used in every step by introducing a mixture of experts.
We additionally propose a new termination algorithm to provide the model the power to choose how many steps to make automatically.
arXiv Detail & Related papers (2024-07-02T08:45:38Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs.
The efficiency of the solver depends on the branchning used to select a variable for splitting.
We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z) - Parallel Tree Kernel Computation [0.0]
We devise a parallel implementation of the sequential algorithm for the computation of some tree kernels of two finite sets of trees.
Results show that the proposed parallel algorithm outperforms the sequential one in terms of latency.
arXiv Detail & Related papers (2023-05-12T18:16:45Z) - Discrete Tree Flows via Tree-Structured Permutations [5.929956715430168]
discrete flow-based models cannot be straightforwardly optimized with conventional deep learning methods because gradients of discrete functions are undefined or zero.
Our approach seeks to reduce computational burden and remove the need for pseudo-gradients by developing a discrete flow based on decision trees.
arXiv Detail & Related papers (2022-07-04T23:11:04Z) - Active-LATHE: An Active Learning Algorithm for Boosting the Error
Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$.
Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z) - Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones.
Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - The Tree Ensemble Layer: Differentiability meets Conditional Computation [8.40843862024745]
We introduce a new layer for neural networks composed of an ensemble of differentiable decision trees (a.k.a. soft trees)
Differentiable trees demonstrate promising results in the literature, but are typically slow in training and inference as they do not support conditional computation.
We develop specialized forward and backward propagation algorithms that exploit sparsity.
arXiv Detail & Related papers (2020-02-18T18:05:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.