Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning
- URL: http://arxiv.org/abs/2107.00243v1
- Date: Thu, 1 Jul 2021 06:43:11 GMT
- Title: Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning
- Authors: Jonathan Wenger and Geoff Pleiss and Philipp Hennig and John P.
Cunningham and Jacob R. Gardner
- Abstract summary: Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
- Score: 54.01682318834995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian processes remain popular as a flexible and expressive model class,
but the computational cost of kernel hyperparameter optimization stands as a
major limiting factor to their scaling and broader adoption. Recent work has
made great strides combining stochastic estimation with iterative numerical
techniques, essentially boiling down GP inference to the cost of (many)
matrix-vector multiplies. Preconditioning -- a highly effective step for any
iterative method involving matrix-vector multiplication -- can be used to
accelerate convergence and thus reduce bias in hyperparameter optimization.
Here, we prove that preconditioning has an additional benefit that has been
previously unexplored. It not only reduces the bias of the $\log$-marginal
likelihood estimator and its derivatives, but it also simultaneously can reduce
variance at essentially negligible cost. We leverage this result to derive
sample-efficient algorithms for GP hyperparameter optimization requiring as few
as $\mathcal{O}(\log(\varepsilon^{-1}))$ instead of
$\mathcal{O}(\varepsilon^{-2})$ samples to achieve error $\varepsilon$. Our
theoretical results enable provably efficient and scalable optimization of
kernel hyperparameters, which we validate empirically on a set of large-scale
benchmark problems. There, variance reduction via preconditioning results in an
order of magnitude speedup in hyperparameter optimization of exact GPs.
Related papers
- A Multi-objective Newton Optimization Algorithm for Hyper-Parameter
Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network.
The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by
Adaptive Discretization [21.859940486704264]
An algorithm such as GPUCB has prohibitive computational complexity.
A norere algorithm for functions corroborates the real problem of continuous optimization.
arXiv Detail & Related papers (2021-06-16T07:55:45Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z) - Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation.
The proposed method uses the swarm method to optimize the performance of the acquisition function.
The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Scalable Hyperparameter Optimization with Lazy Gaussian Processes [1.3999481573773074]
We present a novel, highly accurate approximation of the underlying Gaussian Process.
The first experiments show speedups of a factor of 162 in single node and further speed up by a factor of 5 in a parallel environment.
arXiv Detail & Related papers (2020-01-16T10:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.