Spatio-Temporal Variational Gaussian Processes
- URL: http://arxiv.org/abs/2111.01732v1
- Date: Tue, 2 Nov 2021 16:53:31 GMT
- Title: Spatio-Temporal Variational Gaussian Processes
- Authors: Oliver Hamelijnck, William J. Wilkinson, Niki A. Loppi, Arno Solin,
Theodoros Damoulas
- Abstract summary: We introduce a scalable approach to Gaussian process inference that combinestemporal-temporal filtering with natural variational inference.
We derive a sparse approximation that constructs a state-space model over a reduced set of inducing points.
We show that for separable Markov kernels the full sparse cases recover exactly the standard variational GP.
- Score: 26.60276485130467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a scalable approach to Gaussian process inference that combines
spatio-temporal filtering with natural gradient variational inference,
resulting in a non-conjugate GP method for multivariate data that scales
linearly with respect to time. Our natural gradient approach enables
application of parallel filtering and smoothing, further reducing the temporal
span complexity to be logarithmic in the number of time steps. We derive a
sparse approximation that constructs a state-space model over a reduced set of
spatial inducing points, and show that for separable Markov kernels the full
and sparse cases exactly recover the standard variational GP, whilst exhibiting
favourable computational properties. To further improve the spatial scaling we
propose a mean-field assumption of independence between spatial locations
which, when coupled with sparsity and parallelisation, leads to an efficient
and accurate method for large spatio-temporal problems.
Related papers
- Adaptive Consensus Gradients Aggregation for Scaled Distributed Training [6.234802839923543]
We analyze the distributed gradient aggregation process through the lens of subspace optimization.
Our method demonstrates improved performance over the ubiquitous averaging on multiple tasks while remaining extremely efficient in both communicational and computational complexity.
arXiv Detail & Related papers (2024-11-06T08:16:39Z) - Aiming towards the minimizers: fast convergence of SGD for
overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method.
All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence.
We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z) - Traversing Time with Multi-Resolution Gaussian Process State-Space
Models [17.42262122708566]
We propose a novel Gaussian process state-space architecture composed of multiple components, each trained on a different resolution, to model effects on different timescales.
We benchmark our novel method on semi-synthetic data and on an engine modeling task.
In both experiments, our approach compares favorably against its state-of-the-art alternatives that operate on a single time-scale only.
arXiv Detail & Related papers (2021-12-06T18:39:27Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Combining Pseudo-Point and State Space Approximations for Sum-Separable
Gaussian Processes [48.64129867897491]
We show that there is a simple and elegant way to combine pseudo-point methods with the state space GP approximation framework to get the best of both worlds.
We demonstrate that the combined approach is more scalable and applicable to a greater range of epidemiology--temporal problems than either method on its own.
arXiv Detail & Related papers (2021-06-18T16:30:09Z) - Sparse Algorithms for Markovian Gaussian Processes [18.999495374836584]
Sparse Markovian processes combine the use of inducing variables with efficient Kalman filter-likes recursion.
We derive a general site-based approach to approximate the non-Gaussian likelihood with local Gaussian terms, called sites.
Our approach results in a suite of novel sparse extensions to algorithms from both the machine learning and signal processing, including variational inference, expectation propagation, and the classical nonlinear Kalman smoothers.
The derived methods are suited to literature-temporal data, where the model has separate inducing points in both time and space.
arXiv Detail & Related papers (2021-03-19T09:50:53Z) - The Connection between Discrete- and Continuous-Time Descriptions of
Gaussian Continuous Processes [60.35125735474386]
We show that discretizations yielding consistent estimators have the property of invariance under coarse-graining'
This result explains why combining differencing schemes for derivatives reconstruction and local-in-time inference approaches does not work for time series analysis of second or higher order differential equations.
arXiv Detail & Related papers (2021-01-16T17:11:02Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Fast Variational Learning in State-Space Gaussian Process Models [29.630197272150003]
We build upon an existing method called conjugate-computation variational inference.
We provide an efficient JAX implementation which exploits just-in-time compilation.
Our approach leads to fast and stable variational inference in state-space GP models that can be scaled to time series with millions of data points.
arXiv Detail & Related papers (2020-07-09T12:06:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.