A Scalable MIP-based Method for Learning Optimal Multivariate Decision
Trees
- URL: http://arxiv.org/abs/2011.03375v1
- Date: Fri, 6 Nov 2020 14:17:41 GMT
- Title: A Scalable MIP-based Method for Learning Optimal Multivariate Decision
Trees
- Authors: Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R.
Kalagnanam
- Abstract summary: We propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems.
We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality.
We demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy.
- Score: 17.152864798265455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent publications report advances in training optimal decision
trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in
integer programming and a growing interest in addressing the inherent
suboptimality of heuristic approaches such as CART. In this paper, we propose a
novel MIP formulation, based on a 1-norm support vector machine model, to train
a multivariate ODT for classification problems. We provide cutting plane
techniques that tighten the linear relaxation of the MIP formulation, in order
to improve run times to reach optimality. Using 36 data-sets from the
University of California Irvine Machine Learning Repository, we demonstrate
that our formulation outperforms its counterparts in the literature by an
average of about 10% in terms of mean out-of-sample testing accuracy across the
data-sets. We provide a scalable framework to train multivariate ODT on large
data-sets by introducing a novel linear programming (LP) based data selection
method to choose a subset of the data for training. Our method is able to
routinely handle large data-sets with more than 7,000 sample points and
outperform heuristics methods and other MIP based techniques. We present
results on data-sets containing up to 245,000 samples. Existing MIP-based
methods do not scale well on training data-sets beyond 5,500 samples.
Related papers
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU.
As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining [40.21546440726592]
We propose a novel multi-agent collaborative data selection mechanism for large language models (LLMs) pretraining.
In this framework, each data selection method serves as an independent agent, and an agent console is designed to dynamically integrate the information from all agents.
arXiv Detail & Related papers (2024-10-10T16:45:28Z) - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [63.927438959502226]
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.
A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks.
We propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs.
arXiv Detail & Related papers (2024-04-22T18:18:41Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes.
Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together.
We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z) - Machine Learning for Cutting Planes in Integer Programming: A Survey [21.567191691588643]
We present recent work on machine learning (ML) techniques for selecting cutting planes (or cuts) in mixed-integer linear programming (MILP)
ML offers a promising approach for improving the cut selection process by using data to identify promising cuts that accelerate the solution of MILP instances.
We analyze the empirical results in the literature in an attempt to quantify the progress that has been made and conclude by suggesting avenues for future research.
arXiv Detail & Related papers (2023-02-17T22:26:49Z) - Variational Factorization Machines for Preference Elicitation in
Large-Scale Recommender Systems [17.050774091903552]
We propose a variational formulation of factorization machines (FMs) that can be easily optimized using standard mini-batch descent gradient.
Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions.
We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy.
arXiv Detail & Related papers (2022-12-20T00:06:28Z) - Deep Learning with Multiple Data Set: A Weighted Goal Programming
Approach [2.7393821783237184]
Large-scale data analysis is growing at an exponential rate as data proliferates in our societies.
Deep Learning models require plenty of resources, and distributed training is needed.
This paper presents a Multicriteria approach for distributed learning.
arXiv Detail & Related papers (2021-11-27T07:10:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.