MurTree: Optimal Classification Trees via Dynamic Programming and Search
- URL: http://arxiv.org/abs/2007.12652v4
- Date: Tue, 28 Jun 2022 21:14:36 GMT
- Title: MurTree: Optimal Classification Trees via Dynamic Programming and Search
- Authors: Emir Demirovi\'c, Anna Lukina, Emmanuel Hebrard, Jeffrey Chan, James
Bailey, Christopher Leckie, Kotagiri Ramamohanarao, Peter J. Stuckey
- Abstract summary: We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
- Score: 61.817059565926336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision tree learning is a widely used approach in machine learning,
favoured in applications that require concise and interpretable models.
Heuristic methods are traditionally used to quickly produce models with
reasonably high accuracy. A commonly criticised point, however, is that the
resulting trees may not necessarily be the best representation of the data in
terms of accuracy and size. In recent years, this motivated the development of
optimal classification tree algorithms that globally optimise the decision tree
in contrast to heuristic methods that perform a sequence of locally optimal
decisions. We follow this line of work and provide a novel algorithm for
learning optimal classification trees based on dynamic programming and search.
Our algorithm supports constraints on the depth of the tree and number of
nodes. The success of our approach is attributed to a series of specialised
techniques that exploit properties unique to classification trees. Whereas
algorithms for optimal classification trees have traditionally been plagued by
high runtimes and limited scalability, we show in a detailed experimental study
that our approach uses only a fraction of the time required by the
state-of-the-art and can handle datasets with tens of thousands of instances,
providing several orders of magnitude improvements and notably contributing
towards the practical realisation of optimal decision trees.
Related papers
- LiteSearch: Efficacious Tree Search for LLM [70.29796112457662]
This study introduces a novel guided tree search algorithm with dynamic node selection and node-level exploration budget.
Experiments conducted on the GSM8K and TabMWP datasets demonstrate that our approach enjoys significantly lower computational costs compared to baseline methods.
arXiv Detail & Related papers (2024-06-29T05:14:04Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a new framework based on large language models (LLMs) and decision Tree reasoning (OCTree)
Our key idea is to leverage LLMs' reasoning capabilities to find good feature generation rules without manually specifying the search space.
Our empirical results demonstrate that this simple framework consistently enhances the performance of various prediction models.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Online Learning of Decision Trees with Thompson Sampling [12.403737756721467]
Decision Trees are prominent prediction models for interpretable Machine Learning.
We devise a new Monte Carlo Tree Search algorithm, able to produce optimal Decision Trees in an online setting.
arXiv Detail & Related papers (2024-04-09T15:53:02Z) - Learning a Decision Tree Algorithm with Transformers [80.49817544396379]
We introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification.
We then train MetaTree to produce the trees that achieve strong generalization performance.
arXiv Detail & Related papers (2024-02-06T07:40:53Z) - A Mathematical Programming Approach to Optimal Classification Forests [1.0705399532413618]
We propose a novel mathematical optimization-based methodology in which a given number of trees are simultaneously constructed.
The classification rule is derived by assigning to each observation its most frequently predicted class among the trees in the forest.
We show that our proposed method has equal or superior performance compared with state-of-the-art tree-based classification methods.
arXiv Detail & Related papers (2022-11-18T20:33:08Z) - Quant-BnB: A Scalable Branch-and-Bound Method for Optimal Decision Trees
with Continuous Features [5.663538370244174]
We present a new discrete optimization method based on branch-and-bound (BnB) to obtain optimal decision trees.
Our proposed algorithm Quant-BnB shows significant speedups compared to existing approaches for shallow optimal trees on various real datasets.
arXiv Detail & Related papers (2022-06-23T17:19:29Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z) - Sparsity in Optimal Randomized Classification Trees [3.441021278275805]
We propose a continuous optimization approach to build sparse optimal classification trees, based on oblique cuts.
Both types of sparsity, namely local and global, are modeled by means of regularizations with polyhedral norms.
Unlike greedy approaches, our ability to easily trade in some of our classification accuracy for a gain in global sparsity is shown.
arXiv Detail & Related papers (2020-02-21T09:09:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.