Scalable Hyperparameter Optimization with Lazy Gaussian Processes
- URL: http://arxiv.org/abs/2001.05726v1
- Date: Thu, 16 Jan 2020 10:15:55 GMT
- Title: Scalable Hyperparameter Optimization with Lazy Gaussian Processes
- Authors: Raju Ram, Sabine M\"uller, Franz-Josef Pfreundt, Nicolas R. Gauger,
Janis Keuper
- Abstract summary: We present a novel, highly accurate approximation of the underlying Gaussian Process.
The first experiments show speedups of a factor of 162 in single node and further speed up by a factor of 5 in a parallel environment.
- Score: 1.3999481573773074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most machine learning methods require careful selection of hyper-parameters
in order to train a high performing model with good generalization abilities.
Hence, several automatic selection algorithms have been introduced to overcome
tedious manual (try and error) tuning of these parameters. Due to its very high
sample efficiency, Bayesian Optimization over a Gaussian Processes modeling of
the parameter space has become the method of choice. Unfortunately, this
approach suffers from a cubic compute complexity due to underlying Cholesky
factorization, which makes it very hard to be scaled beyond a small number of
sampling steps. In this paper, we present a novel, highly accurate
approximation of the underlying Gaussian Process. Reducing its computational
complexity from cubic to quadratic allows an efficient strong scaling of
Bayesian Optimization while outperforming the previous approach regarding
optimization accuracy. The first experiments show speedups of a factor of 162
in single node and further speed up by a factor of 5 in a parallel environment.
Related papers
- Enhancing Gaussian Process Surrogates for Optimization and Posterior Approximation via Random Exploration [2.984929040246293]
novel noise-free Bayesian optimization strategies that rely on a random exploration step to enhance the accuracy of Gaussian process surrogate models.
New algorithms retain the ease of implementation of the classical GP-UCB, but an additional exploration step facilitates their convergence.
arXiv Detail & Related papers (2024-01-30T14:16:06Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Hyper-optimization with Gaussian Process and Differential Evolution
Algorithm [0.0]
This paper presents specific modifications of Gaussian Process optimization components from available scientific libraries.
presented modifications were submitted to BlackBox 2020 challenge, where it outperformed some conventionally available optimization libraries.
arXiv Detail & Related papers (2021-01-26T08:33:00Z) - Efficient hyperparameter optimization by way of PAC-Bayes bound
minimization [4.191847852775072]
We present an alternative objective that is equivalent to a Probably Approximately Correct-Bayes (PAC-Bayes) bound on the expected out-of-sample error.
We then devise an efficient gradient-based algorithm to minimize this objective.
arXiv Detail & Related papers (2020-08-14T15:54:51Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Distributed Averaging Methods for Randomized Second Order Optimization [54.51566432934556]
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a bottleneck.
We develop unbiased parameter averaging methods for randomized second order optimization that employ sampling and sketching of the Hessian.
We also extend the framework of second order averaging methods to introduce an unbiased distributed optimization framework for heterogeneous computing systems.
arXiv Detail & Related papers (2020-02-16T09:01:18Z) - Accelerating Quantum Approximate Optimization Algorithm using Machine
Learning [6.735657356113614]
We propose a machine learning based approach to accelerate quantum approximate optimization algorithm (QAOA) implementation.
QAOA is a quantum-classical hybrid algorithm to prove the so-called quantum supremacy.
We show that the proposed approach can curtail the number of optimization iterations by up to 65.7%) from an analysis performed with 264 flavors of graphs.
arXiv Detail & Related papers (2020-02-04T02:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.