Optimal Decision Trees for Nonlinear Metrics
- URL: http://arxiv.org/abs/2009.06921v2
- Date: Fri, 15 Oct 2021 06:01:22 GMT
- Title: Optimal Decision Trees for Nonlinear Metrics
- Authors: Emir Demirovi\'c, Peter J. Stuckey
- Abstract summary: We show a novel algorithm for producing optimal trees for nonlinear metrics.
To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics.
Our approach leads to a trade-off when compared to optimising linear metrics.
- Score: 42.18286681448184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonlinear metrics, such as the F1-score, Matthews correlation coefficient,
and Fowlkes-Mallows index, are often used to evaluate the performance of
machine learning models, in particular, when facing imbalanced datasets that
contain more samples of one class than the other. Recent optimal decision tree
algorithms have shown remarkable progress in producing trees that are optimal
with respect to linear criteria, such as accuracy, but unfortunately nonlinear
metrics remain a challenge. To address this gap, we propose a novel algorithm
based on bi-objective optimisation, which treats misclassifications of each
binary class as a separate objective. We show that, for a large class of
metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain
the optimal tree by using our method to generate the set of all nondominated
trees. To the best of our knowledge, this is the first method to compute
provably optimal decision trees for nonlinear metrics. Our approach leads to a
trade-off when compared to optimising linear metrics: the resulting trees may
be more desirable according to the given nonlinear metric at the expense of
higher runtimes. Nevertheless, the experiments illustrate that runtimes are
reasonable for majority of the tested datasets.
Related papers
- Learning Deep Tree-based Retriever for Efficient Recommendation: Theory and Method [76.31185707649227]
We propose a Deep Tree-based Retriever (DTR) for efficient recommendation.
DTR frames the training task as a softmax-based multi-class classification over tree nodes at the same level.
To mitigate the suboptimality induced by the labeling of non-leaf nodes, we propose a rectification method for the loss function.
arXiv Detail & Related papers (2024-08-21T05:09:53Z) - BooleanOCT: Optimal Classification Trees based on multivariate Boolean
Rules [14.788278997556606]
We introduce a new mixed-integer programming (MIP) formulation to derive the optimal classification tree.
Our methodology integrates both linear metrics, including accuracy, balanced accuracy, and cost-sensitive cost, as well as nonlinear metrics such as the F1-score.
The proposed models demonstrate practical solvability on real-world datasets, effectively handling sizes in the tens of thousands.
arXiv Detail & Related papers (2024-01-29T12:58:44Z) - Sample-and-Bound for Non-Convex Optimization [18.30858789210194]
We propose new sampling methods for non-dimensional objective optimization that adapts Monte Carlo benchmarks to improve efficiency.
We evaluate the proposed high-order baseline and competitive benchmarks algorithms aggressively.
arXiv Detail & Related papers (2024-01-09T20:45:47Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Mixed integer linear optimization formulations for learning optimal
binary classification trees [0.0]
We propose four mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees.
We conduct experiments on 13 publicly available datasets to show the models' ability to scale.
arXiv Detail & Related papers (2022-06-10T03:10:14Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Stochastic Optimization Forests [60.523606291705214]
We show how to train forest decision policies by growing trees that choose splits to directly optimize the downstream decision quality, rather than splitting to improve prediction accuracy as in the standard random forest algorithm.
We show that our approximate splitting criteria can reduce running time hundredfold, while achieving performance close to forest algorithms that exactly re-optimize for every candidate split.
arXiv Detail & Related papers (2020-08-17T16:56:06Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Generalized and Scalable Optimal Sparse Decision Trees [56.35541305670828]
We present techniques that produce optimal decision trees over a variety of objectives.
We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables.
arXiv Detail & Related papers (2020-06-15T19:00:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.