MABSplit: Faster Forest Training Using Multi-Armed Bandits
- URL: http://arxiv.org/abs/2212.07473v1
- Date: Wed, 14 Dec 2022 19:43:43 GMT
- Title: MABSplit: Faster Forest Training Using Multi-Armed Bandits
- Authors: Mo Tiwari, Ryan Kang, Je-Yong Lee, Sebastian Thrun, Chris Piech, Ilan
Shomorony, Martin Jinye Zhang
- Abstract summary: We present an algorithm that accelerates the training of random forests and other tree-based learning methods.
At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points.
Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples.
- Score: 14.650858573237977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random forests are some of the most widely used machine learning models
today, especially in domains that necessitate interpretability. We present an
algorithm that accelerates the training of random forests and other popular
tree-based learning methods. At the core of our algorithm is a novel
node-splitting subroutine, dubbed MABSplit, used to efficiently find split
points when constructing decision trees. Our algorithm borrows techniques from
the multi-armed bandit literature to judiciously determine how to allocate
samples and computational power across candidate split points. We provide
theoretical guarantees that MABSplit improves the sample complexity of each
node split from linear to logarithmic in the number of data points. In some
settings, MABSplit leads to 100x faster training (an 99% reduction in training
time) without any decrease in generalization performance. We demonstrate
similar speedups when MABSplit is used across a variety of forest-based
variants, such as Extremely Random Forests and Random Patches. We also show our
algorithm can be used in both classification and regression tasks. Finally, we
show that MABSplit outperforms existing methods in generalization performance
and feature importance calculations under a fixed computational budget. All of
our experimental results are reproducible via a one-line script at
https://github.com/ThrunGroup/FastForest.
Related papers
- Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models
in Federated Learning [0.22499166814992444]
This paper proposes a new distributed Markov Chain Monte Carlo (MCMC) inference method for DPMMs (DisCGS) using sufficient statistics.
Our approach uses the collapsed Gibbs sampler and is specifically designed to work on distributed data across independent and heterogeneous machines.
For instance, with a dataset of 100K data points, the centralized algorithm requires approximately 12 hours to complete 100 iterations while our approach achieves the same number of iterations in just 3 minutes.
arXiv Detail & Related papers (2023-12-18T13:16:18Z) - An Optimal Algorithm for the Real-Valued Combinatorial Pure Exploration
of Multi-Armed Bandit [65.268245109828]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB)
Existing methods in the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
We propose an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity matches the lower bound.
arXiv Detail & Related papers (2023-06-15T15:37:31Z) - Parallel Tree Kernel Computation [0.0]
We devise a parallel implementation of the sequential algorithm for the computation of some tree kernels of two finite sets of trees.
Results show that the proposed parallel algorithm outperforms the sequential one in terms of latency.
arXiv Detail & Related papers (2023-05-12T18:16:45Z) - FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet
Level [7.410940271545853]
We present FastHebb, an efficient and scalable solution for Hebbian learning.
FastHebb outperforms previous solutions by up to 50 times in terms of training speed.
For the first time, we are able to bring Hebbian algorithms to ImageNet scale.
arXiv Detail & Related papers (2022-07-07T09:04:55Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - Parallel Sampling for Efficient High-dimensional Bayesian Network
Structure Learning [6.85316573653194]
This paper describes an approximate algorithm that performs parallel sampling on Candidate Parent Sets (CPSs)
The modified algorithm, which we call Parallel Sampling MINOBS (PS-MINOBS), constructs the graph by sampling CPSs for each variable.
arXiv Detail & Related papers (2022-02-19T22:35:59Z) - WildWood: a new Random Forest algorithm [0.0]
WildWood is a new ensemble algorithm for supervised learning of Random Forest (RF) type.
WW uses bootstrap out-of-bag samples to compute out-of-bag scores.
WW is fast and competitive compared with other well-established ensemble methods.
arXiv Detail & Related papers (2021-09-16T14:36:56Z) - COPS: Controlled Pruning Before Training Starts [68.8204255655161]
State-of-the-art deep neural network (DNN) pruning techniques, applied one-shot before training starts, evaluate sparse architectures with the help of a single criterion -- called pruning score.
In this work we do not concentrate on a single pruning criterion, but provide a framework for combining arbitrary GSSs to create more powerful pruning strategies.
arXiv Detail & Related papers (2021-07-27T08:48:01Z) - Fast and Complete: Enabling Complete Neural Network Verification with
Rapid and Massively Parallel Incomplete Verifiers [112.23981192818721]
We propose to use backward mode linear relaxation based analysis (LiRPA) to replace Linear Programming (LP) during the BaB process.
Unlike LP, LiRPA when applied naively can produce much weaker bounds and even cannot check certain conflicts of sub-domains during splitting.
We demonstrate an order of magnitude speedup compared to existing LP-based approaches.
arXiv Detail & Related papers (2020-11-27T16:42:12Z) - Resource Allocation in Multi-armed Bandit Exploration: Overcoming
Sublinear Scaling with Adaptive Parallelism [107.48538091418412]
We study exploration in multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls.
We focus in particular on the allocation of distributed computing resources, where we may obtain results faster by allocating more resources per pull.
arXiv Detail & Related papers (2020-10-31T18:19:29Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.