Asynchronous Decentralized Bayesian Optimization for Large Scale
Hyperparameter Optimization
- URL: http://arxiv.org/abs/2207.00479v4
- Date: Tue, 26 Sep 2023 07:02:28 GMT
- Title: Asynchronous Decentralized Bayesian Optimization for Large Scale
Hyperparameter Optimization
- Authors: Romain Egele, Isabelle Guyon, Venkatram Vishwanath, Prasanna
Balaprakash
- Abstract summary: In BO, a computationally cheap surrogate model is employed to learn the relationship between parameter configurations and their performance.
We present an asynchronous-decentralized BO, wherein each worker runs a sequential BO and asynchronously communicates its results through shared storage.
We scale our method without loss of computational efficiency with above 95% of worker's utilization to 1,920 parallel workers.
- Score: 13.89136187674851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian optimization (BO) is a promising approach for hyperparameter
optimization of deep neural networks (DNNs), where each model training can take
minutes to hours. In BO, a computationally cheap surrogate model is employed to
learn the relationship between parameter configurations and their performance
such as accuracy. Parallel BO methods often adopt single manager/multiple
workers strategies to evaluate multiple hyperparameter configurations
simultaneously. Despite significant hyperparameter evaluation time, the
overhead in such centralized schemes prevents these methods to scale on a large
number of workers. We present an asynchronous-decentralized BO, wherein each
worker runs a sequential BO and asynchronously communicates its results through
shared storage. We scale our method without loss of computational efficiency
with above 95% of worker's utilization to 1,920 parallel workers (full
production queue of the Polaris supercomputer) and demonstrate improvement in
model accuracy as well as faster convergence on the CANDLE benchmark from the
Exascale computing project.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices [0.0]
Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters.
We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates.
arXiv Detail & Related papers (2024-01-03T13:07:07Z) - Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates [28.813671194939225]
fully decentralized optimization methods have been advocated as alternatives to the popular parameter server framework.
We propose a fully decentralized algorithm with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with.
We show that DSGD-AAU achieves a linear speedup for convergence and demonstrate its effectiveness via extensive experiments.
arXiv Detail & Related papers (2023-06-11T02:08:59Z) - Massively Parallel Genetic Optimization through Asynchronous Propagation
of Populations [50.591267188664666]
Propulate is an evolutionary optimization algorithm and software package for global optimization.
We provide an MPI-based implementation of our algorithm, which features variants of selection, mutation, crossover, and migration.
We find that Propulate is up to three orders of magnitude faster without sacrificing solution accuracy.
arXiv Detail & Related papers (2023-01-20T18:17:34Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Restructuring, Pruning, and Adjustment of Deep Models for Parallel
Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers)
We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model.
We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z) - Simple and Scalable Parallelized Bayesian Optimization [2.512827436728378]
We propose a simple and scalable BO method for asynchronous parallel settings.
Experiments are carried out with a benchmark function and hyperparameter optimization of multi-layer perceptrons.
arXiv Detail & Related papers (2020-06-24T10:25:27Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.