Related papers: Generalization Properties of Decision Trees on Real-valued and Categorical Features

Generalization Properties of Decision Trees on Real-valued and Categorical Features

URL: http://arxiv.org/abs/2210.10781v1
Date: Tue, 18 Oct 2022 21:50:24 GMT
Title: Generalization Properties of Decision Trees on Real-valued and Categorical Features
Authors: Jean-Samuel Leboeuf, Fr\'ed\'eric LeBlanc and Mario Marchand
Abstract summary: We revisit binary decision trees from the perspective of partitions of the data. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each.
Score: 2.370481325034443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We revisit binary decision trees from the perspective of partitions of the data. We introduce the notion of partitioning function, and we relate it to the growth function and to the VC dimension. We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each. For each feature type, we upper bound the partitioning function of the class of decision stumps before extending the bounds to the class of general decision tree (of any fixed structure) using a recursive approach. Using these new results, we are able to find the exact VC dimension of decision stumps on examples of $\ell$ real-valued features, which is given by the largest integer $d$ such that $2\ell \ge \binom{d}{\lfloor\frac{d}{2}\rfloor}$. Furthermore, we show that the VC dimension of a binary tree structure with $L_T$ leaves on examples of $\ell$ real-valued features is in $O(L_T \log(L_T\ell))$. Finally, we elaborate a pruning algorithm based on these results that performs better than the cost-complexity and reduced-error pruning algorithms on a number of data sets, with the advantage that no cross-validation is required.

Related papers

Optimizing High-Dimensional Oblique Splits [0.0]
This paper explores optimizing high-dimensional $s$-sparse oblique splits from $(vecw, vecwtopboldsymbolX_i) : iin 1,dots, n, vecw in mathbbRp, | vecw |_0 leq s $ for growing oblique trees.
arXiv Detail & Related papers (2025-03-18T16:14:38Z)
Optimal Decision Tree Pruning Revisited: Algorithms and Complexity [22.57063332430197]
We focus on fundamental pruning operations of subtree replacement and raising. While optimal pruning can be performed in time for subtree replacement, the problem is NP-complete for subtree raising. For example, while subtree raising is hard for small domain size, it can be solved in $D2d cdot |I|O(1)$ time, where $|I|$ is the input size.
arXiv Detail & Related papers (2025-03-05T15:02:46Z)
Ehrenfeucht-Haussler Rank and Chain of Thought [51.33559894954108]
We show that the rank of a function $f$ corresponds to the minimum number of Chain of Thought steps required by a single-layer transformer decoder. We also analyze the problem of identifying the position of the $k$-th occurrence of 1 in a Boolean sequence, proving that it requires $k$ CoT steps.
arXiv Detail & Related papers (2025-01-22T16:30:58Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z)
On marginal feature attributions of tree-based models [0.11184789007828977]
Local feature attributions based on marginal expectations, e.g. marginal Shapley, Owen or Banzhaf values, may be employed. We present two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features. We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models.
arXiv Detail & Related papers (2023-02-16T17:18:03Z)
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback. We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode. We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z)
Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation. To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term. For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z)
Measure Inducing Classification and Regression Trees for Functional Data [0.0]
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis. This is achieved by learning a weighted functional $L2$ space by means of constrained convex optimization.
arXiv Detail & Related papers (2020-10-30T18:49:53Z)
Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary. We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z)
Universal guarantees for decision tree induction via a higher-order splitting criterion [16.832966312395126]
Our algorithm achieves provable guarantees for all target functions $f: -1,1n to -1,1$ with respect to the uniform distribution. The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and small subsets of its attributes. Our algorithm satisfies the following guarantee: for all target functions $f : -1,1n to -1,1$, sizes $sin mathbbN$, and error parameters $epsilon$, it constructs a decision
arXiv Detail & Related papers (2020-10-16T21:20:45Z)
Decision trees as partitioning machines to characterize their generalization properties [2.370481325034443]
We revisit binary decision trees on real-valued features from the perspective of partitions of the data. We show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$. We elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets.
arXiv Detail & Related papers (2020-10-14T19:25:58Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph. Going to infinite allows to constrain the computational complexity of the selection process. We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.