Related papers: Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees

URL: http://arxiv.org/abs/2408.01297v1
Date: Fri, 2 Aug 2024 14:37:28 GMT
Title: Optimal Mixed Integer Linear Optimization Trained Multivariate Classification Trees
Authors: Brandon Alston, Illya V. Hicks,
Abstract summary: We propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees. Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multivariate decision trees are powerful machine learning tools for classification and regression that attract many researchers and industry professionals. An optimal binary tree has two types of vertices, (i) branching vertices which have exactly two children and where datapoints are assessed on a set of discrete features and (ii) leaf vertices at which datapoints are given a prediction, and can be obtained by solving a biobjective optimization problem that seeks to (i) maximize the number of correctly classified datapoints and (ii) minimize the number of branching vertices. Branching vertices are linear combinations of training features and therefore can be thought of as hyperplanes. In this paper, we propose two cut-based mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees (leaf vertices assign discrete classes). Our models leverage on-the-fly identification of minimal infeasible subsystems (MISs) from which we derive cutting planes that hold the form of packing constraints. We show theoretical improvements on the strongest flow-based MILO formulation currently in the literature and conduct experiments on publicly available datasets to show our models' ability to scale, strength against traditional branch and bound approaches, and robustness in out-of-sample test performance. Our code and data are available on GitHub.

Related papers

Data Selection for ERMs [67.57726352698933]
We study how well can $mathcalA$ perform when trained on at most $nll N$ data points selected from a population of $N$ points. Our results include optimal data-selection bounds for mean estimation, linear classification, and linear regression.
arXiv Detail & Related papers (2025-04-20T11:26:01Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes. In this paper we examine the use of convex neural recovery models. We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program. We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z)
Solving a Class of Cut-Generating Linear Programs via Machine Learning [0.0]
Cut-generating linear programs (CGLPs) play a key role as a separation oracle to produce valid inequalities for the feasible region of mixed-integer programs. Running the dualPs at the nodes of the branch-and-bound tree is computationally cumbersome due to the number of node candidates and the lack of a priori knowledge on which nodes admit useful cutting planes. We propose a novel framework based on learning to approximate the optimal value of a machineP class that determines whether a cutting plane can be generated at a node of the branch-bound tree.
arXiv Detail & Related papers (2023-10-30T18:31:52Z)
An improved column-generation-based matheuristic for learning classification trees [9.07661731728456]
Decision trees are highly interpretable models for solving classification problems in machine learning (ML) Standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. citefirat 2020column proposed a column-generation-based approach for learning decision trees.
arXiv Detail & Related papers (2023-08-22T14:43:36Z)
Efficient Graph Field Integrators Meet Point Clouds [59.27295475120132]
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds.
arXiv Detail & Related papers (2023-02-02T08:33:36Z)
Margin Optimal Classification Trees [0.0]
We present a novel mixed-integer formulation for the Optimal Classification Tree ( OCT) problem. Our model, denoted as Margin Optimal Classification Tree (MARGOT), exploits the generalization capabilities of Support Vector Machines for binary classification. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing local sparsity of the hyperplanes.
arXiv Detail & Related papers (2022-10-19T14:08:56Z)
Mixed integer linear optimization formulations for learning optimal binary classification trees [0.0]
We propose four mixed integer linear optimization (MILO) formulations for designing optimal binary classification trees. We conduct experiments on 13 publicly available datasets to show the models' ability to scale.
arXiv Detail & Related papers (2022-06-10T03:10:14Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning. The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z)
Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics. To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics. Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z)
Learning to Cluster Faces via Confidence and Connectivity Estimation [136.5291151775236]
We propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.
arXiv Detail & Related papers (2020-04-01T13:39:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.