A Coreset-based, Tempered Variational Posterior for Accurate and
Scalable Stochastic Gaussian Process Inference
- URL: http://arxiv.org/abs/2311.01409v1
- Date: Thu, 2 Nov 2023 17:22:22 GMT
- Title: A Coreset-based, Tempered Variational Posterior for Accurate and
Scalable Stochastic Gaussian Process Inference
- Authors: Mert Ketenci and Adler Perotte and No\'emie Elhadad and I\~nigo
Urteaga
- Abstract summary: We present a novel variational Gaussian process ($mathcalGP$) inference method, based on a posterior over a learnable set of weighted pseudo input-output points (coresets)
We deriveGP's lower bound for the log-marginal likelihood via marginalization of latent $mathcalGP$ coreset variables.
- Score: 2.7855886538423187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel stochastic variational Gaussian process ($\mathcal{GP}$)
inference method, based on a posterior over a learnable set of weighted pseudo
input-output points (coresets). Instead of a free-form variational family, the
proposed coreset-based, variational tempered family for $\mathcal{GP}$s (CVTGP)
is defined in terms of the $\mathcal{GP}$ prior and the data-likelihood; hence,
accommodating the modeling inductive biases. We derive CVTGP's lower bound for
the log-marginal likelihood via marginalization of the proposed posterior over
latent $\mathcal{GP}$ coreset variables, and show it is amenable to stochastic
optimization. CVTGP reduces the learnable parameter size to $\mathcal{O}(M)$,
enjoys numerical stability, and maintains $\mathcal{O}(M^3)$ time- and
$\mathcal{O}(M^2)$ space-complexity, by leveraging a coreset-based tempered
posterior that, in turn, provides sparse and explainable representations of the
data. Results on simulated and real-world regression problems with Gaussian
observation noise validate that CVTGP provides better evidence lower-bound
estimates and predictive root mean squared error than alternative stochastic
$\mathcal{GP}$ inference methods.
Related papers
- Restricted Strong Convexity of Deep Learning Models with Smooth
Activations [31.003601717265006]
We study the problem of optimization of deep learning models with smooth activation functions.
We introduce a new analysis of optimization based on Restricted Strong Convexity (RSC)
Ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models.
arXiv Detail & Related papers (2022-09-29T21:24:26Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - Inverting brain grey matter models with likelihood-free inference: a
tool for trustable cytoarchitecture measurements [62.997667081978825]
characterisation of the brain grey matter cytoarchitecture with quantitative sensitivity to soma density and volume remains an unsolved challenge in dMRI.
We propose a new forward model, specifically a new system of equations, requiring a few relatively sparse b-shells.
We then apply modern tools from Bayesian analysis known as likelihood-free inference (LFI) to invert our proposed model.
arXiv Detail & Related papers (2021-11-15T09:08:27Z) - Input Dependent Sparse Gaussian Processes [1.1470070927586014]
We use a neural network that receives the observed data as an input and outputs the inducing points locations and the parameters of $q$.
We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GP approaches.
arXiv Detail & Related papers (2021-07-15T12:19:10Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$.
We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.