Bayesian Decision Trees via Tractable Priors and Probabilistic
Context-Free Grammars
- URL: http://arxiv.org/abs/2302.07407v1
- Date: Wed, 15 Feb 2023 00:17:41 GMT
- Title: Bayesian Decision Trees via Tractable Priors and Probabilistic
Context-Free Grammars
- Authors: Colin Sullivan, Mo Tiwari, Sebastian Thrun, Chris Piech
- Abstract summary: We propose a new criterion for training Bayesian Decision Trees.
BCART-PCFG can efficiently sample decision trees from a posterior distribution across trees given the data.
We find that trees sampled via BCART-PCFG perform comparable to or better than greedily-constructed Decision Trees.
- Score: 7.259767735431625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision Trees are some of the most popular machine learning models today due
to their out-of-the-box performance and interpretability. Often, Decision Trees
models are constructed greedily in a top-down fashion via heuristic search
criteria, such as Gini impurity or entropy. However, trees constructed in this
manner are sensitive to minor fluctuations in training data and are prone to
overfitting. In contrast, Bayesian approaches to tree construction formulate
the selection process as a posterior inference problem; such approaches are
more stable and provide greater theoretical guarantees. However, generating
Bayesian Decision Trees usually requires sampling from complex, multimodal
posterior distributions. Current Markov Chain Monte Carlo-based approaches for
sampling Bayesian Decision Trees are prone to mode collapse and long mixing
times, which makes them impractical. In this paper, we propose a new criterion
for training Bayesian Decision Trees. Our criterion gives rise to BCART-PCFG,
which can efficiently sample decision trees from a posterior distribution
across trees given the data and find the maximum a posteriori (MAP) tree.
Learning the posterior and training the sampler can be done in time that is
polynomial in the dataset size. Once the posterior has been learned, trees can
be sampled efficiently (linearly in the number of nodes). At the core of our
method is a reduction of sampling the posterior to sampling a derivation from a
probabilistic context-free grammar. We find that trees sampled via BCART-PCFG
perform comparable to or better than greedily-constructed Decision Trees in
classification accuracy on several datasets. Additionally, the trees sampled
via BCART-PCFG are significantly smaller -- sometimes by as much as 20x.
Related papers
- Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - MAPTree: Beating "Optimal" Decision Trees with Bayesian Decision Trees [2.421336072915701]
We present a Bayesian approach to decision tree induction via maximum a posteriori inference of a posterior distribution over trees.
We propose an AND/OR search algorithm, dubbed MAPTree, which is able to recover the maximum a posteriori tree.
arXiv Detail & Related papers (2023-09-26T23:43:37Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Probability Distribution on Rooted Trees [1.3955252961896318]
hierarchical expressive capability of rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning.
One unified approach to solve this is a Bayesian approach, on which the rooted tree is regarded as a random variable and a direct loss function can be assumed on the selected model or the predicted value for a new data point.
In this paper, we propose a generalized probability distribution for any rooted trees in which only the maximum number of child nodes and the maximum depth are fixed.
arXiv Detail & Related papers (2022-01-24T05:13:58Z) - Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG)
We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product.
We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z) - Spectral Top-Down Recovery of Latent Tree Models [13.681975313065477]
Spectral Top-Down Recovery (STDR) is a divide-and-conquer approach for inference of large latent tree models.
STDR's partitioning step is non-random. Instead, it is based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes.
We prove that STDR is statistically consistent, and bound the number of samples required to accurately recover the tree with high probability.
arXiv Detail & Related papers (2021-02-26T02:47:42Z) - SGA: A Robust Algorithm for Partial Recovery of Tree-Structured
Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise.
Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure.
We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z) - Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones.
Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.