Tighter Bounds on the Log Marginal Likelihood of Gaussian Process
Regression Using Conjugate Gradients
- URL: http://arxiv.org/abs/2102.08314v1
- Date: Tue, 16 Feb 2021 17:54:59 GMT
- Title: Tighter Bounds on the Log Marginal Likelihood of Gaussian Process
Regression Using Conjugate Gradients
- Authors: Artem Artemev, David R. Burt and Mark van der Wilk
- Abstract summary: We show that approximate maximum likelihood learning of model parameters by maximising our lower bound retains many of the sparse variational approach benefits.
In experiments, we show improved predictive performance with our model for a comparable amount of training time compared to other conjugate gradient based approaches.
- Score: 19.772149500352945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a lower bound on the log marginal likelihood of Gaussian process
regression models that can be computed without matrix factorisation of the full
kernel matrix. We show that approximate maximum likelihood learning of model
parameters by maximising our lower bound retains many of the sparse variational
approach benefits while reducing the bias introduced into parameter learning.
The basis of our bound is a more careful analysis of the log-determinant term
appearing in the log marginal likelihood, as well as using the method of
conjugate gradients to derive tight lower bounds on the term involving a
quadratic form. Our approach is a step forward in unifying methods relying on
lower bound maximisation (e.g. variational methods) and iterative approaches
based on conjugate gradients for training Gaussian processes. In experiments,
we show improved predictive performance with our model for a comparable amount
of training time compared to other conjugate gradient based approaches.
Related papers
- Towards Provable Log Density Policy Gradient [6.0891236991406945]
Policy gradient methods are a vital ingredient behind the success of modern reinforcement learning.
In this work, we argue that this residual term is significant and correcting for it could potentially improve sample-complexity of reinforcement learning methods.
We propose log density gradient to estimate the policy gradient, which corrects for this residual error term.
arXiv Detail & Related papers (2024-03-03T20:09:09Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Model-Based Reparameterization Policy Gradient Methods: Theory and
Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics.
Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes.
We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z) - Noise Estimation in Gaussian Process Regression [1.5002438468152661]
The presented method can be used to estimate the variance of the correlated error, and the variance of the noise based on maximizing a marginal likelihood function.
We demonstrate the computational advantages and robustness of the presented approach compared to traditional parameter optimization.
arXiv Detail & Related papers (2022-06-20T19:36:03Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - Sparse Orthogonal Variational Inference for Gaussian Processes [34.476453597078894]
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points.
We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new variational inference algorithms.
arXiv Detail & Related papers (2019-10-23T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.