Related papers: Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning

URL: http://arxiv.org/abs/2107.00243v1
Date: Thu, 1 Jul 2021 06:43:11 GMT
Title: Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning
Authors: Jonathan Wenger and Geoff Pleiss and Philipp Hennig and John P. Cunningham and Jacob R. Gardner
Abstract summary: Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication. We prove that preconditioning has an additional benefit that has been previously unexplored. It simultaneously can reduce variance at essentially negligible cost.
Score: 54.01682318834995
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaussian processes remain popular as a flexible and expressive model class, but the computational cost of kernel hyperparameter optimization stands as a major limiting factor to their scaling and broader adoption. Recent work has made great strides combining stochastic estimation with iterative numerical techniques, essentially boiling down GP inference to the cost of (many) matrix-vector multiplies. Preconditioning -- a highly effective step for any iterative method involving matrix-vector multiplication -- can be used to accelerate convergence and thus reduce bias in hyperparameter optimization. Here, we prove that preconditioning has an additional benefit that has been previously unexplored. It not only reduces the bias of the $\log$-marginal likelihood estimator and its derivatives, but it also simultaneously can reduce variance at essentially negligible cost. We leverage this result to derive sample-efficient algorithms for GP hyperparameter optimization requiring as few as $\mathcal{O}(\log(\varepsilon^{-1}))$ instead of $\mathcal{O}(\varepsilon^{-2})$ samples to achieve error $\varepsilon$. Our theoretical results enable provably efficient and scalable optimization of kernel hyperparameters, which we validate empirically on a set of large-scale benchmark problems. There, variance reduction via preconditioning results in an order of magnitude speedup in hyperparameter optimization of exact GPs.

Related papers

WENDy for Nonlinear-in-Parameters ODEs [1.9573380763700712]
The Weak-form Estimation of Non-linear Dynamics (WEN) is extended to accommodate systems of ordinary differential equations that are nonlinear-ins. We present results on a suite of benchmark systems to demonstrate the practical benefits of our approach.
arXiv Detail & Related papers (2025-02-13T01:40:21Z)
A Multi-objective Newton Optimization Algorithm for Hyper-Parameter Search [0.0]
The algorithm is applied to search the optimal probability threshold (a vector of eight parameters) for a multiclass object detection problem of a convolutional neural network. The algorithm produces an overall higher true positive (TP) and lower false positive (FP) rates, as compared to using the default value of 0.5.
arXiv Detail & Related papers (2024-01-07T21:12:34Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization [21.859940486704264]
An algorithm such as GPUCB has prohibitive computational complexity. A norere algorithm for functions corroborates the real problem of continuous optimization.
arXiv Detail & Related papers (2021-06-16T07:55:45Z)
Implicit differentiation for fast hyperparameter selection in non-smooth convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth. We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
Hyper-parameter estimation method with particle swarm optimization [0.8883733362171032]
The PSO method cannot be directly used in the problem of hyper- parameters estimation. The proposed method uses the swarm method to optimize the performance of the acquisition function. The results on several problems are improved.
arXiv Detail & Related papers (2020-11-24T07:51:51Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points. The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding. In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
Implicit differentiation of Lasso-type models for hyperparameter optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z)
Scalable Hyperparameter Optimization with Lazy Gaussian Processes [1.3999481573773074]
We present a novel, highly accurate approximation of the underlying Gaussian Process. The first experiments show speedups of a factor of 162 in single node and further speed up by a factor of 5 in a parallel environment.
arXiv Detail & Related papers (2020-01-16T10:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.