Related papers: Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression

Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression

URL: http://arxiv.org/abs/2406.12678v1
Date: Tue, 18 Jun 2024 14:50:42 GMT
Title: Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression
Authors: Bernhard Stankewitz, Botond Szabo,
Abstract summary: We analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to their flexibility and theoretical tractability Gaussian process (GP) regression models have become a central topic in modern statistics and machine learning. While the true posterior in these models is given explicitly, numerical evaluations depend on the inversion of the augmented kernel matrix $ K + \sigma^2 I $, which requires up to $ O(n^3) $ operations. For large sample sizes n, which are typically given in modern applications, this is computationally infeasible and necessitates the use of an approximate version of the posterior. Although such methods are widely used in practice, they typically have very limtied theoretical underpinning. In this context, we analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. They can be interpreted in terms of Lanczos approximate eigenvectors of the kernel matrix or a conjugate gradient approximation of the posterior mean, which are particularly advantageous in truly large scale applications, as they are fundamentally only based on matrix vector multiplications amenable to the GPU acceleration of modern software frameworks. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates. Our theoretical findings are illustrated by numerical experiments.

Related papers

Quantum-assisted Gaussian process regression using random Fourier features [8.271361104403802]
We introduce a quantum-assisted algorithm for Gaussian process regression based on the random Fourier feature kernel approximation.<n>We achieve a sparse-order computational speedup relative to the classical method.
arXiv Detail & Related papers (2025-07-30T12:49:53Z)
Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning [1.0282274843007797]
We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties.<n>To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems.
arXiv Detail & Related papers (2025-07-06T05:17:28Z)
Assessing Quantum Advantage for Gaussian Process Regression [7.10052009802944]
We show that several quantum algorithms proposed for Gaussian Process Regression show no exponential speedup.<n>The implications for the quantum runtime algorithms are independent of the complexity of loading classical data on a quantum computer.<n>We supplement our theoretical analysis with numerical verification for popular kernels in machine learning.
arXiv Detail & Related papers (2025-05-28T15:50:56Z)
Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output. We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output. We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z)
Gaussian Process Regression under Computational and Epistemic Misspecification [4.5656369638728656]
In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the element error.
arXiv Detail & Related papers (2023-12-14T18:53:32Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
Rigorous dynamical mean field theory for stochastic gradient descent methods [17.90683687731009]
We prove closed-form equations for the exact high-dimensionals of a family of first order gradient-based methods. This includes widely used algorithms such as gradient descent (SGD) or Nesterov acceleration.
arXiv Detail & Related papers (2022-10-12T21:10:55Z)
More Than a Toy: Random Matrix Models Predict How Real-World Neural Representations Generalize [94.70343385404203]
We find that most theoretical analyses fall short of capturing qualitative phenomena even for kernel regression. We prove that the classical GCV estimator converges to the generalization risk whenever a local random matrix law holds. Our findings suggest that random matrix theory may be central to understanding the properties of neural representations in practice.
arXiv Detail & Related papers (2022-03-11T18:59:01Z)
Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections. Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z)
High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.