SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data
- URL: http://arxiv.org/abs/2105.04211v1
- Date: Mon, 10 May 2021 09:10:17 GMT
- Title: SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data
- Authors: Maud Lemercier, Cristopher Salvi, Thomas Cass, Edwin V. Bonilla,
Theodoros Damoulas, Terry Lyons
- Abstract summary: We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data.
We show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE)
This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO.
- Score: 16.463077353773603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Making predictions and quantifying their uncertainty when the input data is
sequential is a fundamental learning challenge, recently attracting increasing
attention. We develop SigGPDE, a new scalable sparse variational inference
framework for Gaussian Processes (GPs) on sequential data. Our contribution is
twofold. First, we construct inducing variables underpinning the sparse
approximation so that the resulting evidence lower bound (ELBO) does not
require any matrix inversion. Second, we show that the gradients of the GP
signature kernel are solutions of a hyperbolic partial differential equation
(PDE). This theoretical insight allows us to build an efficient
back-propagation algorithm to optimize the ELBO. We showcase the significant
computational gains of SigGPDE compared to existing methods, while achieving
state-of-the-art performance for classification tasks on large datasets of up
to 1 million multivariate time series.
Related papers
- Domain Invariant Learning for Gaussian Processes and Bayesian
Exploration [39.83530605880014]
We propose a domain invariant learning algorithm for Gaussian processes (DIL-GP) with a min-max optimization on the likelihood.
Numerical experiments demonstrate the superiority of DIL-GP for predictions on several synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-18T16:13:34Z) - Large-Scale Gaussian Processes via Alternating Projection [23.79090469387859]
We propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling mini-batching.
Our algorithm, based on alternating projection, has $mathcalO(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets.
arXiv Detail & Related papers (2023-10-26T04:20:36Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Kernel Clustering with Sigmoid-based Regularization for Efficient
Segmentation of Sequential Data [3.8326963933937885]
segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures.
A popular Kernel for optimally solving this problem is dynamic programming (DP), which has quadratic computation and memory requirements.
Although many algorithms have been proposed to approximate the optimal segmentation, they have no guarantee on the quality of their solutions.
arXiv Detail & Related papers (2021-06-22T04:32:21Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Quadruply Stochastic Gaussian Processes [10.152838128195466]
We introduce a variational inference procedure for training scalable Gaussian process (GP) models whose per-iteration complexity is independent of both the number of training points, $n$, and the number basis functions used in the kernel approximation, $m$.
We demonstrate accurate inference on large classification and regression datasets using GPs and relevance vector machines with up to $m = 107$ basis functions.
arXiv Detail & Related papers (2020-06-04T17:06:25Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.