Related papers: GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms

GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms

URL: http://arxiv.org/abs/2012.04201v2
Date: Sat, 12 Dec 2020 08:58:46 GMT
Title: GPU Accelerated Exhaustive Search for Optimal Ensemble of Black-Box Optimization Algorithms
Authors: Jiwei Liu, Bojan Tunguz, Gilberto Titericz
Abstract summary: We show that a simple ensemble of black-box optimization algorithms can outperform any single one of them. We propose a Multi-GPU-optimized framework to accelerate a brute force search for the optimal ensemble. We evaluate 15s by training 2.7 million models and running 541,440 optimizations.
Score: 1.246150324257064
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Black-box optimization is essential for tuning complex machine learning algorithms which are easier to experiment with than to understand. In this paper, we show that a simple ensemble of black-box optimization algorithms can outperform any single one of them. However, searching for such an optimal ensemble requires a large number of experiments. We propose a Multi-GPU-optimized framework to accelerate a brute force search for the optimal ensemble of black-box optimization algorithms by running many experiments in parallel. The lightweight optimizations are performed by CPU while expensive model training and evaluations are assigned to GPUs. We evaluate 15 optimizers by training 2.7 million models and running 541,440 optimizations. On a DGX-1, the search time is reduced from more than 10 days on two 20-core CPUs to less than 24 hours on 8-GPUs. With the optimal ensemble found by GPU-accelerated exhaustive search, we won the 2nd place of NeurIPS 2020 black-box optimization challenge.

Related papers

A GPU Implementation of Multi-Guiding Spark Fireworks Algorithm for Efficient Black-Box Neural Network Optimization [2.9608128305931825]
This paper presents a GPU-accelerated version of the Multi-Guiding Spark Fireworks Algorithm (MGFWA) We demonstrate its superior performance in terms of both speed and solution quality. The proposed implementation offers a promising approach to accelerate swarm intelligence algorithms.
arXiv Detail & Related papers (2025-01-07T17:09:07Z)
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization. We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate. We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z)
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters [0.0]
opt-UM and opt-Brc introduce significant enhancements to Hartree-Fock caculations up to $f$-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-$zeta$ basis sets, while opt-UM is advantageous for large molecular systems.
arXiv Detail & Related papers (2024-07-31T08:49:06Z)
GPU Based Differential Evolution: New Insights and Comparative Study [7.5961910202572644]
This work reviews the main architectural choices made in the literature for GPU based Differential Evolution algorithms. It introduces a new GPU based numerical optimisation benchmark to evaluate and compare GPU based DE algorithms.
arXiv Detail & Related papers (2024-05-26T12:40:39Z)
SIP: Autotuning GPU Native Schedules via Stochastic Instruction Perturbation [0.0]
Large language models (LLMs) have become a significant workload since their appearance. They are also computationally expensive as they have billions of parameters and are trained with massive amounts of data. Recent works have developed dedicated kernels for LLM training and inference instead of relying on compilergenerated ones, so that hardware resources are as fully utilized as possible.
arXiv Detail & Related papers (2024-03-25T15:26:50Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers [65.60007071024629]
We show that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy.
arXiv Detail & Related papers (2021-10-13T20:58:15Z)
Implementation of Parallel Simplified Swarm Optimization in CUDA [2.322689362836168]
In optimization computing, intelligent swarm algorithms (SIAs) method is suitable for parallelization. This paper proposed a GPU-based Simplified Swarm Algorithm Optimization (PSSO) based on the platform considering computational ability and versatility. As the results showed, the time complexity has successfully reduced by an order of magnitude of N, and the problem of resource preemption was avoided entirely.
arXiv Detail & Related papers (2021-10-01T00:15:45Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples. We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
Implementing a GPU-based parallel MAX-MIN Ant System [0.0]
We discuss a range of novel ideas for improving the GPU-based parallel MMAS implementation. We show that our MMAS implementation is competitive with state-of-the-art GPU-based and multi-core CPU-based parallel ACO implementations.
arXiv Detail & Related papers (2020-01-18T14:18:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.