Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS
- URL: http://arxiv.org/abs/2508.20588v1
- Date: Thu, 28 Aug 2025 09:27:20 GMT
- Title: Unbiased Stochastic Optimization for Gaussian Processes on Finite Dimensional RKHS
- Authors: Neta Shoham, Haim Avron,
- Abstract summary: We propose algorithms for exact inference of GPs with kernels that induce a Reproducing Hilbert Space (RKHS) of moderate finite dimension.<n>Our approach can also be extended to infinite dimensional RKHSs at the cost of forgoing exactness.
- Score: 7.841948280267278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current methods for stochastic hyperparameter learning in Gaussian Processes (GPs) rely on approximations, such as computing biased stochastic gradients or using inducing points in stochastic variational inference. However, when using such methods we are not guaranteed to converge to a stationary point of the true marginal likelihood. In this work, we propose algorithms for exact stochastic inference of GPs with kernels that induce a Reproducing Kernel Hilbert Space (RKHS) of moderate finite dimension. Our approach can also be extended to infinite dimensional RKHSs at the cost of forgoing exactness. Both for finite and infinite dimensional RKHSs, our method achieves better experimental results than existing methods when memory resources limit the feasible batch size and the possible number of inducing points.
Related papers
- Gradient-Based Non-Linear Inverse Learning [2.6149030745627644]
We study statistical inverse learning in the context of nonlinear inverse problems under random design.<n>We employ gradient descent (GD) and descent gradient (SGD) with mini-batching, both using constant step sizes.<n>Our analysis derives convergence rates for both algorithms under classical a priori assumptions on the smoothness of the target function.
arXiv Detail & Related papers (2024-12-21T22:38:17Z) - Compactly-supported nonstationary kernels for computing exact Gaussian processes on big data [2.8377382540923004]
The Gaussian process (GP) is a widely used machine learning method with implicit uncertainty characterization.<n>Traditional implementations of GPs involve stationary kernels that limit their flexibility.<n>We derive an alternative kernel that can discover and encode both sparsity and nonstationarity.
arXiv Detail & Related papers (2024-11-07T20:07:21Z) - Almost-sure convergence of iterates and multipliers in stochastic
sequential quadratic optimization [21.022322975077653]
Methods for solving continuous optimization problems with equality constraints have attracted attention recently.
convergence guarantees have been limited to the expected value of a gradient to measure zero.
New almost-sure convergence guarantees for the primals, Lagrange measures and station measures generated by a SQP algorithm are proved.
arXiv Detail & Related papers (2023-08-07T16:03:40Z) - Sampling from Gaussian Process Posteriors using Stochastic Gradient
Descent [43.097493761380186]
gradient algorithms are an efficient method of approximately solving linear systems.
We show that gradient descent produces accurate predictions, even in cases where it does not converge quickly to the optimum.
Experimentally, gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks.
arXiv Detail & Related papers (2023-06-20T15:07:37Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Compressed Empirical Measures (in finite dimensions) [4.73194777046253]
We study approaches for compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs)
A key quantity that controls how large such a coreset has to be is the size of the largest ball around the empirical measure that is contained within the empirical convex set.
arXiv Detail & Related papers (2022-04-19T12:25:41Z) - Gaussian Processes and Statistical Decision-making in Non-Euclidean
Spaces [96.53463532832939]
We develop techniques for broadening the applicability of Gaussian processes.
We introduce a wide class of efficient approximations built from this viewpoint.
We develop a collection of Gaussian process models over non-Euclidean spaces.
arXiv Detail & Related papers (2022-02-22T01:42:57Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.