Generalization Properties of Decision Trees on Real-valued and
Categorical Features
- URL: http://arxiv.org/abs/2210.10781v1
- Date: Tue, 18 Oct 2022 21:50:24 GMT
- Title: Generalization Properties of Decision Trees on Real-valued and
Categorical Features
- Authors: Jean-Samuel Leboeuf, Fr\'ed\'eric LeBlanc and Mario Marchand
- Abstract summary: We revisit binary decision trees from the perspective of partitions of the data.
We consider three types of features: real-valued, categorical ordinal and categorical nominal, with different split rules for each.
- Score: 2.370481325034443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit binary decision trees from the perspective of partitions of the
data. We introduce the notion of partitioning function, and we relate it to the
growth function and to the VC dimension. We consider three types of features:
real-valued, categorical ordinal and categorical nominal, with different split
rules for each. For each feature type, we upper bound the partitioning function
of the class of decision stumps before extending the bounds to the class of
general decision tree (of any fixed structure) using a recursive approach.
Using these new results, we are able to find the exact VC dimension of decision
stumps on examples of $\ell$ real-valued features, which is given by the
largest integer $d$ such that $2\ell \ge \binom{d}{\lfloor\frac{d}{2}\rfloor}$.
Furthermore, we show that the VC dimension of a binary tree structure with
$L_T$ leaves on examples of $\ell$ real-valued features is in $O(L_T
\log(L_T\ell))$. Finally, we elaborate a pruning algorithm based on these
results that performs better than the cost-complexity and reduced-error pruning
algorithms on a number of data sets, with the advantage that no
cross-validation is required.
Related papers
- Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - On Computing Optimal Tree Ensembles [7.424944196676223]
Random forests and, more generally, (decisionnobreakdash-)tree ensembles are widely used methods for classification and regression.
Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth.
We provide two novel algorithms and corresponding lower bounds.
arXiv Detail & Related papers (2023-06-07T13:30:43Z) - On marginal feature attributions of tree-based models [0.11184789007828977]
Local feature attributions based on marginal expectations, e.g. marginal Shapley, Owen or Banzhaf values, may be employed.
We present two (statistically similar) decision trees that compute the exact same function for which the "path-dependent" TreeSHAP yields different rankings of features.
We exploit the symmetry to derive an explicit formula, with improved complexity and only in terms of the internal model parameters, for marginal Shapley (and Banzhaf and Owen) values of CatBoost models.
arXiv Detail & Related papers (2023-02-16T17:18:03Z) - Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function
Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback.
We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode.
We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z) - Rethinking Learnable Tree Filter for Generic Feature Transform [71.77463476808585]
Learnable Tree Filter presents a remarkable approach to model structure-preserving relations for semantic segmentation.
To relax the geometric constraint, we give the analysis by reformulating it as a Markov Random Field and introduce a learnable unary term.
For semantic segmentation, we achieve leading performance (82.1% mIoU) on the Cityscapes benchmark without bells-and-whistles.
arXiv Detail & Related papers (2020-12-07T07:16:47Z) - Measure Inducing Classification and Regression Trees for Functional Data [0.0]
We propose a tree-based algorithm for classification and regression problems in the context of functional data analysis.
This is achieved by learning a weighted functional $L2$ space by means of constrained convex optimization.
arXiv Detail & Related papers (2020-10-30T18:49:53Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - Universal guarantees for decision tree induction via a higher-order
splitting criterion [16.832966312395126]
Our algorithm achieves provable guarantees for all target functions $f: -1,1n to -1,1$ with respect to the uniform distribution.
The crux of our extension is a new splitting criterion that takes into account the correlations between $f$ and small subsets of its attributes.
Our algorithm satisfies the following guarantee: for all target functions $f : -1,1n to -1,1$, sizes $sin mathbbN$, and error parameters $epsilon$, it constructs a decision
arXiv Detail & Related papers (2020-10-16T21:20:45Z) - Decision trees as partitioning machines to characterize their
generalization properties [2.370481325034443]
We revisit binary decision trees on real-valued features from the perspective of partitions of the data.
We show that the VC dimension of a binary tree structure with $N$ internal nodes is of order $N log(Nell)$.
We elaborate a pruning algorithm based on these results that performs better than the CART algorithm on a number of datasets.
arXiv Detail & Related papers (2020-10-14T19:25:58Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Infinite Feature Selection: A Graph-based Feature Filtering Approach [78.63188057505012]
We propose a filtering feature selection framework that considers subsets of features as paths in a graph.
Going to infinite allows to constrain the computational complexity of the selection process.
We show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori.
arXiv Detail & Related papers (2020-06-15T07:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.