A Scalable MIP-based Method for Learning Optimal Multivariate Decision
Trees
- URL: http://arxiv.org/abs/2011.03375v1
- Date: Fri, 6 Nov 2020 14:17:41 GMT
- Title: A Scalable MIP-based Method for Learning Optimal Multivariate Decision
Trees
- Authors: Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R.
Kalagnanam
- Abstract summary: We propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems.
We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality.
We demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy.
- Score: 17.152864798265455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent publications report advances in training optimal decision
trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in
integer programming and a growing interest in addressing the inherent
suboptimality of heuristic approaches such as CART. In this paper, we propose a
novel MIP formulation, based on a 1-norm support vector machine model, to train
a multivariate ODT for classification problems. We provide cutting plane
techniques that tighten the linear relaxation of the MIP formulation, in order
to improve run times to reach optimality. Using 36 data-sets from the
University of California Irvine Machine Learning Repository, we demonstrate
that our formulation outperforms its counterparts in the literature by an
average of about 10% in terms of mean out-of-sample testing accuracy across the
data-sets. We provide a scalable framework to train multivariate ODT on large
data-sets by introducing a novel linear programming (LP) based data selection
method to choose a subset of the data for training. Our method is able to
routinely handle large data-sets with more than 7,000 sample points and
outperform heuristics methods and other MIP based techniques. We present
results on data-sets containing up to 245,000 samples. Existing MIP-based
methods do not scale well on training data-sets beyond 5,500 samples.
Related papers
- Data curation via joint example selection further accelerates multimodal learning [3.329535792151987]
We show that jointly selecting batches of data is more effective for learning than selecting examples independently.
We derive a simple and tractable algorithm for selecting such batches, which significantly accelerate training beyond individually-prioritized data points.
arXiv Detail & Related papers (2024-06-25T16:52:37Z) - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [63.927438959502226]
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.
A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks.
We propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs.
arXiv Detail & Related papers (2024-04-22T18:18:41Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - Machine Learning for Cutting Planes in Integer Programming: A Survey [21.567191691588643]
We present recent work on machine learning (ML) techniques for selecting cutting planes (or cuts) in mixed-integer linear programming (MILP)
ML offers a promising approach for improving the cut selection process by using data to identify promising cuts that accelerate the solution of MILP instances.
We analyze the empirical results in the literature in an attempt to quantify the progress that has been made and conclude by suggesting avenues for future research.
arXiv Detail & Related papers (2023-02-17T22:26:49Z) - MILO: Model-Agnostic Subset Selection Framework for Efficient Model
Training and Tuning [68.12870241637636]
We propose MILO, a model-agnostic subset selection framework that decouples the subset selection from model training.
Our empirical results indicate that MILO can train models $3times - 10 times$ faster and tune hyperparameters $20times - 75 times$ faster than full-dataset training or tuning without performance.
arXiv Detail & Related papers (2023-01-30T20:59:30Z) - Variational Factorization Machines for Preference Elicitation in
Large-Scale Recommender Systems [17.050774091903552]
We propose a variational formulation of factorization machines (FMs) that can be easily optimized using standard mini-batch descent gradient.
Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions.
We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy.
arXiv Detail & Related papers (2022-12-20T00:06:28Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - Deep Learning with Multiple Data Set: A Weighted Goal Programming
Approach [2.7393821783237184]
Large-scale data analysis is growing at an exponential rate as data proliferates in our societies.
Deep Learning models require plenty of resources, and distributed training is needed.
This paper presents a Multicriteria approach for distributed learning.
arXiv Detail & Related papers (2021-11-27T07:10:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.