Scalable Optimal Multiway-Split Decision Trees with Constraints
- URL: http://arxiv.org/abs/2302.06812v1
- Date: Tue, 14 Feb 2023 03:48:48 GMT
- Title: Scalable Optimal Multiway-Split Decision Trees with Constraints
- Authors: Shivaram Subramanian, Wei Sun
- Abstract summary: We present a novel path-based MIP formulation where the number of decision variables is independent of $N$.
Our framework produces a multiway-split tree which is more interpretable than the typical binary-split trees due to its shorter rules.
We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points.
- Score: 3.092691764363848
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: There has been a surge of interest in learning optimal decision trees using
mixed-integer programs (MIP) in recent years, as heuristic-based methods do not
guarantee optimality and find it challenging to incorporate constraints that
are critical for many practical applications. However, existing MIP methods
that build on an arc-based formulation do not scale well as the number of
binary variables is in the order of $\mathcal{O}(2^dN)$, where $d$ and $N$
refer to the depth of the tree and the size of the dataset. Moreover, they can
only handle sample-level constraints and linear metrics. In this paper, we
propose a novel path-based MIP formulation where the number of decision
variables is independent of $N$. We present a scalable column generation
framework to solve the MIP optimally. Our framework produces a multiway-split
tree which is more interpretable than the typical binary-split trees due to its
shorter rules. Our method can handle nonlinear metrics such as F1 score and
incorporate a broader class of constraints. We demonstrate its efficacy with
extensive experiments. We present results on datasets containing up to
1,008,372 samples while existing MIP-based decision tree models do not scale
well on data beyond a few thousand points. We report superior or competitive
results compared to the state-of-art MIP-based methods with up to a 24X
reduction in runtime.
Related papers
- Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced.
We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD)
We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $mathcalO(n3)$ complexity.
arXiv Detail & Related papers (2024-11-11T23:21:01Z) - Constrained Prescriptive Trees via Column Generation [5.273277327802614]
We introduce a novel path-based mixed-integer program (MIP) formulation which identifies a (near) optimal policy efficiently via column generation.
We demonstrate the efficacy of our method with extensive experiments on both synthetic and real datasets.
arXiv Detail & Related papers (2022-07-20T19:30:22Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - On multivariate randomized classification trees: $l_0$-based sparsity,
VC~dimension and decomposition methods [0.9346127431927981]
We investigate the nonlinear continuous optimization formulation proposed in Blanquero et al.
We first consider alternative methods to sparsify such trees based on concave approximations of the $l_0$ norm"
We propose a general decomposition scheme and an efficient version of it. Experiments on larger datasets show that the proposed decomposition method is able to significantly reduce the training times without compromising the accuracy.
arXiv Detail & Related papers (2021-12-09T22:49:08Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Convex Polytope Trees [57.56078843831244]
convex polytope trees (CPT) are proposed to expand the family of decision trees by an interpretable generalization of their decision boundary.
We develop a greedy method to efficiently construct CPT and scalable end-to-end training algorithms for the tree parameters when the tree structure is given.
arXiv Detail & Related papers (2020-10-21T19:38:57Z) - Optimal Decision Trees for Nonlinear Metrics [42.18286681448184]
We show a novel algorithm for producing optimal trees for nonlinear metrics.
To the best of our knowledge, this is the first method to compute provably optimal decision trees for nonlinear metrics.
Our approach leads to a trade-off when compared to optimising linear metrics.
arXiv Detail & Related papers (2020-09-15T08:30:56Z) - Sparse Regression at Scale: Branch-and-Bound rooted in First-Order
Optimization [6.037383467521294]
We present a new exact MIP framework for $ell_0$ regularized regression.
Our framework can scale to $p sim 107$, achieving speedups of at least $5000$x.
We open source the implementation through our toolkit L0BnB.
arXiv Detail & Related papers (2020-04-13T18:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.