Related papers: Generalized and Scalable Optimal Sparse Decision Trees

Generalized and Scalable Optimal Sparse Decision Trees

URL: http://arxiv.org/abs/2006.08690v3
Date: Tue, 11 Aug 2020 03:51:29 GMT
Title: Generalized and Scalable Optimal Sparse Decision Trees
Authors: Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer
Abstract summary: We present techniques that produce optimal decision trees over a variety of objectives. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
Score: 56.35541305670828
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find optimal decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.

Related papers

Decision Tree Induction Through LLMs via Semantically-Aware Evolution [53.0367886783772]
We propose an evolutionary optimization method for decision tree induction based on genetic programming (GP) Our key innovation is the integration of semantic priors and domain-specific knowledge about the search space into the algorithm. This is operationalized through novel genetic operators that work with structured natural language prompts.
arXiv Detail & Related papers (2025-03-18T12:52:03Z)
Near Optimal Decision Trees in a SPLIT Second [16.99892407039875]
Decision tree optimization is fundamental to interpretable machine learning. Recent approaches find the global optimum using branch and bound with dynamic programming. We introduce a family of algorithms called SPLIT that moves us significantly forward in achieving this ideal balance.
arXiv Detail & Related papers (2025-02-21T22:57:17Z)
Tree ensemble kernels for Bayesian optimization with known constraints over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search. Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function. Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z)
Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees with Continuous Features [5.663538370244174]
We present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees. Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.
arXiv Detail & Related papers (2022-06-23T17:19:29Z)
bsnsing: A decision tree induction method based on recursive optimal boolean rule composition [2.28438857884398]
This paper proposes a new mixed-integer programming (MIP) formulation to optimize split rule selection in the decision tree induction process. It develops an efficient search solver that is able to solve practical instances faster than commercial solvers.
arXiv Detail & Related papers (2022-05-30T17:13:57Z)
How Smart Guessing Strategies Can Yield Massive Scalability Improvements for Sparse Decision Tree Optimization [18.294573939199438]
Current algorithms often require impractical amounts of time and memory to find optimal or near-optimal trees for some real-world datasets. We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree.
arXiv Detail & Related papers (2021-12-01T19:39:28Z)
Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm. We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
ENTMOOT: A Framework for Optimization over Ensemble Tree Models [57.98561336670884]
ENTMOOT is a framework for integrating tree models into larger optimization problems. We show how ENTMOOT allows a simple integration of tree models into decision-making and black-box optimization.
arXiv Detail & Related papers (2020-03-10T14:34:07Z)
Self-Directed Online Machine Learning for Topology Optimization [58.920693413667216]
Self-directed Online Learning Optimization integrates Deep Neural Network (DNN) with Finite Element Method (FEM) calculations. Our algorithm was tested by four types of problems including compliance minimization, fluid-structure optimization, heat transfer enhancement and truss optimization. It reduced the computational time by 2 5 orders of magnitude compared with directly using methods, and outperformed all state-of-the-art algorithms tested in our experiments.
arXiv Detail & Related papers (2020-02-04T20:00:28Z)
Evolutionary algorithms for constructing an ensemble of decision trees [0.0]
We propose several methods for induction of decision trees and their ensembles based on evolutionary algorithms. The main difference of our approach is using real-valued vector representation of decision tree. We test the predictive performance of this methods using several public UCI data sets.
arXiv Detail & Related papers (2020-02-03T13:38:50Z)
Optimal Sparse Decision Trees [25.043477914272046]
This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques. Our experiments highlight advantages in scalability, speed, and proof of optimality.
arXiv Detail & Related papers (2019-04-29T17:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.