Marginalised Gaussian Processes with Nested Sampling
- URL: http://arxiv.org/abs/2010.16344v2
- Date: Fri, 19 Nov 2021 17:58:35 GMT
- Title: Marginalised Gaussian Processes with Nested Sampling
- Authors: Fergus Simpson, Vidhi Lalchand, Carl Edward Rasmussen
- Abstract summary: Gaussian Process (GPs) models are a rich distribution over functions with inductive biases controlled by a kernel function.
This work presents an alternative learning procedure where the hyperparameters of the kernel function are marginalised using Nested Sampling (NS)
- Score: 10.495114898741203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian Process (GPs) models are a rich distribution over functions with
inductive biases controlled by a kernel function. Learning occurs through the
optimisation of kernel hyperparameters using the marginal likelihood as the
objective. This classical approach known as Type-II maximum likelihood (ML-II)
yields point estimates of the hyperparameters, and continues to be the default
method for training GPs. However, this approach risks underestimating
predictive uncertainty and is prone to overfitting especially when there are
many hyperparameters. Furthermore, gradient based optimisation makes ML-II
point estimates highly susceptible to the presence of local minima. This work
presents an alternative learning procedure where the hyperparameters of the
kernel function are marginalised using Nested Sampling (NS), a technique that
is well suited to sample from complex, multi-modal distributions. We focus on
regression tasks with the spectral mixture (SM) class of kernels and find that
a principled approach to quantifying model uncertainty leads to substantial
gains in predictive performance across a range of synthetic and benchmark data
sets. In this context, nested sampling is also found to offer a speed advantage
over Hamiltonian Monte Carlo (HMC), widely considered to be the gold-standard
in MCMC based inference.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Sparse Gaussian Process Hyperparameters: Optimize or Integrate? [5.949779668853556]
We propose an algorithm for sparse Gaussian process regression which leverages MCMC to sample from the hyperparameter posterior.
We compare this scheme against natural baselines in literature along with variational GPs (SVGPs) along with an extensive computational analysis.
arXiv Detail & Related papers (2022-11-04T14:06:59Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Hybrid Random Features [60.116392415715275]
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs)
HRFs automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
arXiv Detail & Related papers (2021-10-08T20:22:59Z) - Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for
Safety-Critical Applications [71.23286211775084]
We introduce robust Gaussian process uniform error bounds in settings with unknown hyper parameters.
Our approach computes a confidence region in the space of hyper parameters, which enables us to obtain a probabilistic upper bound for the model error.
Experiments show that the bound performs significantly better than vanilla and fully Bayesian processes.
arXiv Detail & Related papers (2021-09-06T17:10:01Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Learning Nonparametric Volterra Kernels with Gaussian Processes [0.0]
This paper introduces a method for the nonparametric Bayesian learning of nonlinear operators, through the use of the Volterra series with kernels represented using Gaussian processes (GPs)
When the input function to the operator is unobserved and has a GP prior, the NVKM constitutes a powerful method for both single and multiple output regression, and can be viewed as a nonlinear and nonparametric latent force model.
arXiv Detail & Related papers (2021-06-10T08:21:00Z) - On MCMC for variationally sparse Gaussian processes: A pseudo-marginal
approach [0.76146285961466]
Gaussian processes (GPs) are frequently used in machine learning and statistics to construct powerful models.
We propose a pseudo-marginal (PM) scheme that offers exact inference as well as computational gains through doubly estimators for the likelihood and large datasets.
arXiv Detail & Related papers (2021-03-04T20:48:29Z) - Gaussian Process Latent Class Choice Models [7.992550355579791]
We present a non-parametric class of probabilistic machine learning within discrete choice models (DCMs)
The proposed model would assign individuals probabilistically to behaviorally homogeneous clusters (latent classes) using GPs.
The model is tested on two different mode choice applications and compared against different LCCM benchmarks.
arXiv Detail & Related papers (2021-01-28T19:56:42Z) - Approximate Inference for Fully Bayesian Gaussian Process Regression [11.47317712333228]
Learning in Gaussian Process models occurs through the adaptation of hyper parameters of the mean and the covariance function.
An alternative learning procedure is to infer the posterior over hyper parameters in a hierarchical specification of GPs we call textitFully Bayesian Gaussian Process Regression (GPR)
We analyze the predictive performance for fully Bayesian GPR on a range of benchmark data sets.
arXiv Detail & Related papers (2019-12-31T17:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.