Related papers: Decision trees as partitioning machines to characterize their generalization properties

Decision trees as partitioning machines to characterize their generalization properties

URL: http://arxiv.org/abs/2010.07374v1
Date: Wed, 14 Oct 2020 19:25:58 GMT
Title: Decision trees as partitioning machines to characterize their generalization properties
Authors: Jean-Samuel Leboeuf, Fr\'ed\'eric LeBlanc and Mario Marchand
Abstract summary: We revisit binary decision trees on real-valued features from the perspective of partitions of the data. We show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$. We elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets.
Score: 2.370481325034443
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision trees are popular machine learning models that are simple to build and easy to interpret. Even though algorithms to learn decision trees date back to almost 50 years, key properties affecting their generalization error are still weakly bounded. Hence, we revisit binary decision trees on real-valued features from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. Using this new concept, we are able to find the exact VC dimension of decision stumps, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\left\lfloor\frac{d}{2}\right\rfloor}$, where $\ell$ is the number of real-valued features. We provide a recursive expression to bound the partitioning functions, resulting in a upper bound on the growth function of any decision tree structure. This allows us to show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N \log(N\ell)$. Finally, we elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets, with the advantage that no cross-validation is required.

Related papers

Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising. While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising. For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z)
Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees. We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z)
On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest [8.340919965932443]
We show that the majority function of $n$ variables can be represented by a bag of $T$ ($ n$) decision trees each with size if $n-T$ is a constant. A related result on the $k$-out-of-$n$ functions is presented too.
arXiv Detail & Related papers (2023-12-16T02:09:34Z)
On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z)
Generalization Properties of Decision Trees on Real-valued and Categorical Features [2.370481325034443]
We revisit binary decision trees from the perspective of partitions of the data. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each.
arXiv Detail & Related papers (2022-10-18T21:50:24Z)
Active-LATHE: An Active Learning Algorithm for Boosting the Error Exponent for Learning Homogeneous Ising Trees [75.93186954061943]
We design and analyze an algorithm that boosts the error exponent by at least 40% when $rho$ is at least $0.8$. Our analysis hinges on judiciously exploiting the minute but detectable statistical variation of the samples to allocate more data to parts of the graph.
arXiv Detail & Related papers (2021-10-27T10:45:21Z)
Tree in Tree: from Decision Trees to Decision Graphs [2.2336243882030025]
Tree in Tree decision graph (TnT) is a framework that extends the conventional decision tree to a more generic and powerful directed acyclic graph. Our proposed model is a novel, more efficient, and accurate alternative to the widely-used decision trees.
arXiv Detail & Related papers (2021-10-01T13:20:05Z)
Properly learning decision trees in almost polynomial time [25.763690981846125]
We give an $nO(loglog n)$-time membership query algorithm for properly and agnostically learning decision trees. Our algorithm shares similarities with practicals for learning decision trees. We show how every decision tree can be "pruned" so that every variable in the resulting tree is influential.
arXiv Detail & Related papers (2021-09-01T22:12:47Z)
Robustifying Algorithms of Learning Latent Trees with Vector Variables [92.18777020401484]
We present the sample complexities of Recursive Grouping (RG) and Chow-Liu Recursive Grouping (CLRG) We robustify RG, CLRG, Neighbor Joining (NJ) and Spectral NJ (SNJ) by using the truncated inner product. We derive the first known instance-dependent impossibility result for structure learning of latent trees.
arXiv Detail & Related papers (2021-06-02T01:37:52Z)
SGA: A Robust Algorithm for Partial Recovery of Tree-Structured Graphical Models with Noisy Samples [75.32013242448151]
We consider learning Ising tree models when the observations from the nodes are corrupted by independent but non-identically distributed noise. Katiyar et al. (2020) showed that although the exact tree structure cannot be recovered, one can recover a partial tree structure. We propose Symmetrized Geometric Averaging (SGA), a more statistically robust algorithm for partial tree recovery.
arXiv Detail & Related papers (2021-01-22T01:57:35Z)
Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects. We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.