Posterior and Computational Uncertainty in Gaussian Processes
- URL: http://arxiv.org/abs/2205.15449v5
- Date: Mon, 9 Oct 2023 20:14:51 GMT
- Title: Posterior and Computational Uncertainty in Gaussian Processes
- Authors: Jonathan Wenger, Geoff Pleiss, Marvin Pf\"ortner, Philipp Hennig, John
P. Cunningham
- Abstract summary: Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
- Score: 52.26904059556759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian processes scale prohibitively with the size of the dataset. In
response, many approximation methods have been developed, which inevitably
introduce approximation error. This additional source of uncertainty, due to
limited computation, is entirely ignored when using the approximate posterior.
Therefore in practice, GP models are often as much about the approximation
method as they are about the data. Here, we develop a new class of methods that
provides consistent estimation of the combined uncertainty arising from both
the finite number of data observed and the finite amount of computation
expended. The most common GP approximations map to an instance in this class,
such as methods based on the Cholesky factorization, conjugate gradients, and
inducing points. For any method in this class, we prove (i) convergence of its
posterior mean in the associated RKHS, (ii) decomposability of its combined
posterior covariance into mathematical and computational covariances, and (iii)
that the combined variance is a tight worst-case bound for the squared error
between the method's posterior mean and the latent function. Finally, we
empirically demonstrate the consequences of ignoring computational uncertainty
and show how implicitly modeling it improves generalization performance on
benchmark datasets.
Related papers
- Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data [9.913418444556486]
We show how iterative methods can be used to reduce the computational costs for calculating likelihoods, gradients, and predictive distributions with FSAs.
We also present a novel, accurate, and fast way to calculate predictive variances relying on estimations and iterative methods.
All methods are implemented in a free C++ software library with high-level Python and R packages.
arXiv Detail & Related papers (2024-05-23T12:25:22Z) - Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models [11.141688859736805]
We introduce and analyze several preconditioners, derive new convergence results, and propose novel methods for accurately approxing predictive variances.
In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based calculations.
All methods are implemented in a free C++ software library with high-level Python and R packages.
arXiv Detail & Related papers (2023-10-18T14:31:16Z) - Improved Convergence Rates for Sparse Approximation Methods in
Kernel-Based Learning [48.08663378234329]
Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications.
Existing sparse approximation methods can yield a significant reduction in the computational cost.
We provide novel confidence intervals for the Nystr"om method and the sparse variational Gaussian processes approximation method.
arXiv Detail & Related papers (2022-02-08T17:22:09Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - The Shooting Regressor; Randomized Gradient-Based Ensembles [0.0]
An ensemble method is introduced that utilizes randomization and loss function gradients to compute a prediction.
Multiple weakly-correlated estimators approximate the gradient at randomly sampled points on the error surface and are aggregated into a final solution.
arXiv Detail & Related papers (2020-09-14T03:20:59Z) - Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z) - Consistent Online Gaussian Process Regression Without the Sample
Complexity Bottleneck [14.309243378538012]
We propose an online compression scheme that fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior.
For constant error radius, POG converges to a neighborhood of the population posterior (Theorem 1(ii))but with finite memory at-worst determined by the metric entropy of the feature space.
arXiv Detail & Related papers (2020-04-23T11:52:06Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.