Simple is better: Making Decision Trees faster using random sampling
- URL: http://arxiv.org/abs/2108.08790v1
- Date: Thu, 19 Aug 2021 17:00:21 GMT
- Title: Simple is better: Making Decision Trees faster using random sampling
- Authors: Vignesh Nanda Kumar and Narayanan U Edakunni
- Abstract summary: In recent years, gradient boosted decision trees have become popular in building robust machine learning models on big data.
We show theoretically and empirically that choosing the split points uniformly at random provides the same or even better performance in terms of accuracy and computational efficiency.
- Score: 4.284674689172996
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, gradient boosted decision trees have become popular in
building robust machine learning models on big data. The primary technique that
has enabled these algorithms success has been distributing the computation
while building the decision trees. A distributed decision tree building, in
turn, has been enabled by building quantiles of the big datasets and choosing
the candidate split points from these quantile sets. In XGBoost, for instance,
a sophisticated quantile building algorithm is employed to identify the
candidate split points for the decision trees. This method is often projected
to yield better results when the computation is distributed. In this paper, we
dispel the notion that these methods provide more accurate and scalable methods
for building decision trees in a distributed manner. In a significant
contribution, we show theoretically and empirically that choosing the split
points uniformly at random provides the same or even better performance in
terms of accuracy and computational efficiency. Hence, a simple random
selection of points suffices for decision tree building compared to more
sophisticated methods.
Related papers
- Learning accurate and interpretable decision trees [27.203303726977616]
We develop approaches to design decision tree learning algorithms given repeated access to data from the same domain.
We study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression.
We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees.
arXiv Detail & Related papers (2024-05-24T20:10:10Z) - Learning a Decision Tree Algorithm with Transformers [75.96920867382859]
We introduce MetaTree, a transformer-based model trained via meta-learning to directly produce strong decision trees.
We fit both greedy decision trees and globally optimized decision trees on a large number of datasets, and train MetaTree to produce only the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - TreeDQN: Learning to minimize Branch-and-Bound tree [78.52895577861327]
Branch-and-Bound is a convenient approach to solving optimization tasks in the form of Mixed Linear Programs.
The efficiency of the solver depends on the branchning used to select a variable for splitting.
We propose a reinforcement learning method that can efficiently learn the branching.
arXiv Detail & Related papers (2023-06-09T14:01:26Z) - Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees
with Continuous Features [5.663538370244174]
We present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees.
Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.
arXiv Detail & Related papers (2022-06-23T17:19:29Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Sample Complexity of Tree Search Configuration: Cutting Planes and
Beyond [98.92725321081994]
State-of-the-art solvers integrate a myriad of cutting-plane techniques to speed up the underlying tree-search algorithm.
We prove the first guarantees for learning high-performing cut-selection policies tailored to the instance distribution.
arXiv Detail & Related papers (2021-06-08T00:57:59Z) - Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones.
Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.