Related papers: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

URL: http://arxiv.org/abs/2011.03375v1
Date: Fri, 6 Nov 2020 14:17:41 GMT
Title: A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees
Authors: Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R. Kalagnanam
Abstract summary: We propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. We demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy.
Score: 17.152864798265455
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several recent publications report advances in training optimal decision trees (ODT) using mixed-integer programs (MIP), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on a 1-norm support vector machine model, to train a multivariate ODT for classification problems. We provide cutting plane techniques that tighten the linear relaxation of the MIP formulation, in order to improve run times to reach optimality. Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets. We provide a scalable framework to train multivariate ODT on large data-sets by introducing a novel linear programming (LP) based data selection method to choose a subset of the data for training. Our method is able to routinely handle large data-sets with more than 7,000 sample points and outperform heuristics methods and other MIP based techniques. We present results on data-sets containing up to 245,000 samples. Existing MIP-based methods do not scale well on training data-sets beyond 5,500 samples.

Related papers

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection [28.442470930703337]
PRISM is a training-free approach for efficient multimodal data selection. It uses Pearson correlation analysis to quantify the intrinsic visual encoding properties of MLLMs. It reduces the overall time required for visual instruction tuning and data selection to just 30% of conventional methods.
arXiv Detail & Related papers (2025-02-17T18:43:41Z)
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility [52.08428597962423]
Large Language Models improve with increasing amounts of high-quality training data. We find token-counts outperform manual and learned mixes, indicating that simple approaches for dataset size and diversity are surprisingly effective. We propose two complementary approaches: UtiliMax, which extends token-based $200s by incorporating utility estimates from reduced-scale ablations, achieving up to a 10.6x speedup over manual baselines; and Model Estimated Data Utility (MEDU), which leverages LLMs to estimate data utility from small samples, matching ablation-based performance while reducing computational requirements by $simx.
arXiv Detail & Related papers (2025-01-20T21:10:22Z)
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference [55.150117654242706]
We show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty.
arXiv Detail & Related papers (2024-11-01T21:11:48Z)
A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance. Data selection has shown promise in identifying the most representative samples from the entire dataset. We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z)
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining [40.21546440726592]
We propose a novel multi-agent collaborative data selection mechanism for large language models (LLMs) pretraining. In this framework, each data selection method serves as an independent agent, and an agent console is designed to dynamically integrate the information from all agents.
arXiv Detail & Related papers (2024-10-10T16:45:28Z)
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [63.927438959502226]
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. We propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs.
arXiv Detail & Related papers (2024-04-22T18:18:41Z)
How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs) We find that Ask-LLM and Density sampling are the best methods in their respective categories. In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Efficient Online Data Mixing For Language Model Pre-Training [101.45242332613944]
Existing data selection methods suffer from slow and computationally expensive processes. Data mixing, on the other hand, reduces the complexity of data selection by grouping data points together. We develop an efficient algorithm for Online Data Mixing (ODM) that combines elements from both data selection and data mixing.
arXiv Detail & Related papers (2023-12-05T00:42:35Z)
Data-driven Mixed Integer Optimization through Probabilistic Multi-variable Branching [8.03915440701838]
We propose a Pre-trained Mixed Optimization framework (PreMIO) that accelerates online mixed integer program (MIP) solving with offline datasets and machine learning models. Our method is based on a data-driven multi-variable cardinality branching procedure that splits the feasible region using hyperplanes chosen by the concentration inequalities.
arXiv Detail & Related papers (2023-05-21T05:11:30Z)
Machine Learning for Cutting Planes in Integer Programming: A Survey [21.567191691588643]
We present recent work on machine learning (ML) techniques for selecting cutting planes (or cuts) in mixed-integer linear programming (MILP) ML offers a promising approach for improving the cut selection process by using data to identify promising cuts that accelerate the solution of MILP instances. We analyze the empirical results in the literature in an attempt to quantify the progress that has been made and conclude by suggesting avenues for future research.
arXiv Detail & Related papers (2023-02-17T22:26:49Z)
Variational Factorization Machines for Preference Elicitation in Large-Scale Recommender Systems [17.050774091903552]
We propose a variational formulation of factorization machines (FMs) that can be easily optimized using standard mini-batch descent gradient. Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions. We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy.
arXiv Detail & Related papers (2022-12-20T00:06:28Z)
Deep Learning with Multiple Data Set: A Weighted Goal Programming Approach [2.7393821783237184]
Large-scale data analysis is growing at an exponential rate as data proliferates in our societies. Deep Learning models require plenty of resources, and distributed training is needed. This paper presents a Multicriteria approach for distributed learning.
arXiv Detail & Related papers (2021-11-27T07:10:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.