Random Fourier Signature Features
- URL: http://arxiv.org/abs/2311.12214v1
- Date: Mon, 20 Nov 2023 22:08:17 GMT
- Title: Random Fourier Signature Features
- Authors: Csaba Toth, Harald Oberhauser, Zoltan Szabo
- Abstract summary: algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel.
Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences.
We develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences.
- Score: 9.85256783464329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tensor algebras give rise to one of the most powerful measures of similarity
for sequences of arbitrary length called the signature kernel accompanied with
attractive theoretical guarantees from stochastic analysis. Previous algorithms
to compute the signature kernel scale quadratically in terms of the length and
the number of the sequences. To mitigate this severe computational bottleneck,
we develop a random Fourier feature-based acceleration of the signature kernel
acting on the inherently non-Euclidean domain of sequences. We show uniform
approximation guarantees for the proposed unbiased estimator of the signature
kernel, while keeping its computation linear in the sequence length and number.
In addition, combined with recent advances on tensor projections, we derive two
even more scalable time series features with favourable concentration
properties and computational complexity both in time and memory. Our empirical
results show that the reduction in computational cost comes at a negligible
price in terms of accuracy on moderate-sized datasets, and it enables one to
scale to large datasets up to a million time series.
Related papers
- Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - B\'ezier Gaussian Processes for Tall and Wide Data [24.00638575411818]
We introduce a kernel that allows the number of summarising variables to grow exponentially with the number of input features.
We show that our kernel has close similarities to some of the most used kernels in Gaussian process regression.
arXiv Detail & Related papers (2022-09-01T10:22:14Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - SigGPDE: Scaling Sparse Gaussian Processes on Sequential Data [16.463077353773603]
We develop SigGPDE, a new scalable sparse variational inference framework for Gaussian Processes (GPs) on sequential data.
We show that the gradients of the GP signature kernel are solutions of a hyperbolic partial differential equation (PDE)
This theoretical insight allows us to build an efficient back-propagation algorithm to optimize the ELBO.
arXiv Detail & Related papers (2021-05-10T09:10:17Z) - Photonic co-processors in HPC: using LightOn OPUs for Randomized
Numerical Linear Algebra [53.13961454500934]
We show that the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware.
We show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms.
arXiv Detail & Related papers (2021-04-29T15:48:52Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Sparse Algorithms for Markovian Gaussian Processes [18.999495374836584]
Sparse Markovian processes combine the use of inducing variables with efficient Kalman filter-likes recursion.
We derive a general site-based approach to approximate the non-Gaussian likelihood with local Gaussian terms, called sites.
Our approach results in a suite of novel sparse extensions to algorithms from both the machine learning and signal processing, including variational inference, expectation propagation, and the classical nonlinear Kalman smoothers.
The derived methods are suited to literature-temporal data, where the model has separate inducing points in both time and space.
arXiv Detail & Related papers (2021-03-19T09:50:53Z) - Faster Kernel Interpolation for Gaussian Processes [30.04235162264955]
Key challenge in scaling Process (GP) regression to massive datasets is that exact inference requires a dense n x n kernel matrix.
Structured kernel (SKI) is among the most scalable methods.
We show that SKI can be reduced to O(m log m) after a single O(n) time precomputation step.
We demonstrate speedups in practice for a wide range of m and n and apply the method to GP inference on a three-dimensional weather radar dataset with over 100 million points.
arXiv Detail & Related papers (2021-01-28T00:09:22Z) - Stochastic Gradient Langevin with Delayed Gradients [29.6870062491741]
We show that the rate of convergence in measure is not significantly affected by the error caused by the delayed gradient information used for computation.
We show that the rate of convergence in measure is not significantly affected by the error caused by the delayed gradient information used for computation, suggesting significant potential for speedup in wall clock time.
arXiv Detail & Related papers (2020-06-12T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.