Related papers: How Good are Low-Rank Approximations in Gaussian Process Regression?

How Good are Low-Rank Approximations in Gaussian Process Regression?

URL: http://arxiv.org/abs/2004.01584v5
Date: Tue, 14 Dec 2021 11:28:12 GMT
Title: How Good are Low-Rank Approximations in Gaussian Process Regression?
Authors: Constantinos Daskalakis, Petros Dellaportas, Aristeidis Panos
Abstract summary: We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
Score: 24.09582049403961
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations: based on random Fourier features, and based on truncating the kernel's Mercer expansion. In particular, we bound the Kullback-Leibler divergence between an exact GP and one resulting from one of the afore-described low-rank approximations to its kernel, as well as between their corresponding predictive densities, and we also bound the error between predictive mean vectors and between predictive covariance matrices computed using the exact versus using the approximate GP. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.

Related papers

A general technique for approximating high-dimensional empirical kernel matrices [16.583173656638806]
We present user-friendly bounds for the expected operator norm of a random kernel matrix on the kernel function $k(cdot,cdot)$.<n>We then apply our method to provide new, tighter approximations for inner-product kernel matrix on general high-dimensional data.
arXiv Detail & Related papers (2025-11-05T22:36:52Z)
Likelihood Ratio Tests by Kernel Gaussian Embedding [0.0]
We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding.<n>Our test builds on recent results showing how such combined embeddings map distinct probability measures to mutually singular Gaussian measures on the kernel's RKHS.<n>We construct a test statistic based on the relative entropy between the Gaussian embeddings, in effect the likelihood ratio.
arXiv Detail & Related papers (2025-08-11T13:41:38Z)
Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Variance-Reducing Couplings for Random Features [57.73648780299374]
Random features (RFs) are a popular technique to scale up kernel methods in machine learning. We find couplings to improve RFs defined on both Euclidean and discrete input spaces. We reach surprising conclusions about the benefits and limitations of variance reduction as a paradigm.
arXiv Detail & Related papers (2024-05-26T12:25:09Z)
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data. Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization [6.012173616364571]
We combine a variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. For this setting, our variational approximation can be computed via gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate for stationary kernels than alternative approaches.
arXiv Detail & Related papers (2023-01-30T21:50:08Z)
Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset. Many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z)
How Good are Low-Rank Approximations in Gaussian Process Regression? [28.392890577684657]
We provide guarantees for approximate Gaussian Process (GP) regression resulting from two common low-rank kernel approximations. We provide experiments on both simulated data and standard benchmarks to evaluate the effectiveness of our theoretical bounds.
arXiv Detail & Related papers (2021-12-13T04:04:08Z)
A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
Towards Unbiased Random Features with Lower Variance For Stationary Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods. With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.