New Bounds for Sparse Variational Gaussian Processes
- URL: http://arxiv.org/abs/2502.08730v1
- Date: Wed, 12 Feb 2025 19:04:26 GMT
- Title: New Bounds for Sparse Variational Gaussian Processes
- Authors: Michalis K. Titsias,
- Abstract summary: Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models.
At the core of these methods is the assumption that the true posterior distribution over training function values $bf f$ and inducing variables $bf u$ is approximated by a variational distribution that incorporates the conditional GP prior $p(bf f | bf u)$ in its factorization.
We show that for model training we can relax it through the use of a more general variational distribution $q(bf f | bf u)$
- Score: 8.122270502556374
- License:
- Abstract: Sparse variational Gaussian processes (GPs) construct tractable posterior approximations to GP models. At the core of these methods is the assumption that the true posterior distribution over training function values ${\bf f}$ and inducing variables ${\bf u}$ is approximated by a variational distribution that incorporates the conditional GP prior $p({\bf f} | {\bf u})$ in its factorization. While this assumption is considered as fundamental, we show that for model training we can relax it through the use of a more general variational distribution $q({\bf f} | {\bf u})$ that depends on $N$ extra parameters, where $N$ is the number of training examples. In GP regression, we can analytically optimize the evidence lower bound over the extra parameters and express a tractable collapsed bound that is tighter than the previous bound. The new bound is also amenable to stochastic optimization and its implementation requires minor modifications to existing sparse GP code. Further, we also describe extensions to non-Gaussian likelihoods. On several datasets we demonstrate that our method can reduce bias when learning the hyperpaparameters and can lead to better predictive performance.
Related papers
- Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models [65.71506381302815]
We propose amortize the cost of sampling from a posterior distribution of the form $p(mathbfxmidmathbfy) propto p_theta(mathbfx)$.
For many models and constraints of interest, the posterior in the noise space is smoother than the posterior in the data space, making it more amenable to such amortized inference.
arXiv Detail & Related papers (2025-02-10T19:49:54Z) - A Coreset-based, Tempered Variational Posterior for Accurate and
Scalable Stochastic Gaussian Process Inference [2.7855886538423187]
We present a novel variational Gaussian process ($mathcalGP$) inference method, based on a posterior over a learnable set of weighted pseudo input-output points (coresets)
We deriveGP's lower bound for the log-marginal likelihood via marginalization of latent $mathcalGP$ coreset variables.
arXiv Detail & Related papers (2023-11-02T17:22:22Z) - Shallow and Deep Nonparametric Convolutions for Gaussian Processes [0.0]
We introduce a nonparametric process convolution formulation for GPs that alleviates weaknesses by using a functional sampling approach.
We propose a composition of these nonparametric convolutions that serves as an alternative to classic deep GP models.
arXiv Detail & Related papers (2022-06-17T19:03:04Z) - Non-Gaussian Gaussian Processes for Few-Shot Regression [71.33730039795921]
We propose an invertible ODE-based mapping that operates on each component of the random variable vectors and shares the parameters across all of them.
NGGPs outperform the competing state-of-the-art approaches on a diversified set of benchmarks and applications.
arXiv Detail & Related papers (2021-10-26T10:45:25Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Input Dependent Sparse Gaussian Processes [1.1470070927586014]
We use a neural network that receives the observed data as an input and outputs the inducing points locations and the parameters of $q$.
We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GP approaches.
arXiv Detail & Related papers (2021-07-15T12:19:10Z) - Reducing the Variance of Gaussian Process Hyperparameter Optimization
with Preconditioning [54.01682318834995]
Preconditioning is a highly effective step for any iterative method involving matrix-vector multiplication.
We prove that preconditioning has an additional benefit that has been previously unexplored.
It simultaneously can reduce variance at essentially negligible cost.
arXiv Detail & Related papers (2021-07-01T06:43:11Z) - Transforming Gaussian Processes With Normalizing Flows [15.886048234706633]
We show that a parametric invertible transformation can be made input-dependent and encode interpretable prior knowledge.
We derive a variational approximation to the resulting inference problem, which is as fast as variational GP regression.
The resulting algorithm's computational and inferential performance is excellent, and we demonstrate this on a range of data sets.
arXiv Detail & Related papers (2020-11-03T09:52:37Z) - Likelihood-Free Inference with Deep Gaussian Processes [70.74203794847344]
Surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations.
We propose a Deep Gaussian Process (DGP) surrogate model that can handle more irregularly behaved target distributions.
Our experiments show how DGPs can outperform GPs on objective functions with multimodal distributions and maintain a comparable performance in unimodal cases.
arXiv Detail & Related papers (2020-06-18T14:24:05Z) - Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$.
We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.