Sparsity in Optimal Randomized Classification Trees
- URL: http://arxiv.org/abs/2002.09191v1
- Date: Fri, 21 Feb 2020 09:09:59 GMT
- Title: Sparsity in Optimal Randomized Classification Trees
- Authors: Rafael Blanquero, Emilio Carrizosa, Cristina Molero-R\'io, Dolores
Romero Morales
- Abstract summary: We propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts.
Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms.
Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.
- Score: 3.441021278275805
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision trees are popular Classification and Regression tools and, when
small-sized, easy to interpret. Traditionally, a greedy approach has been used
to build the trees, yielding a very fast training process; however, controlling
sparsity (a proxy for interpretability) is challenging. In recent studies,
optimal decision trees, where all decisions are optimized simultaneously, have
shown a better learning performance, especially when oblique cuts are
implemented. In this paper, we propose a continuous optimization approach to
build sparse optimal classification trees, based on oblique cuts, with the aim
of using fewer predictor variables in the cuts as well as along the whole tree.
Both types of sparsity, namely local and global, are modeled by means of
regularizations with polyhedral norms. The computational experience reported
supports the usefulness of our methodology. In all our data sets, local and
global sparsity can be improved without harming classification accuracy. Unlike
greedy approaches, our ability to easily trade in some of our classification
accuracy for a gain in global sparsity is shown.
Related papers
- Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy.
This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree.
Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z) - Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - A Mathematical Programming Approach to Optimal Classification Forests [1.0705399532413618]
We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed.
The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest.
We show that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods.
arXiv Detail & Related papers (2022-11-18T20:33:08Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.