A General Theory for Kernel Packets: from state space model to compactly supported basis
- URL: http://arxiv.org/abs/2402.04022v4
- Date: Wed, 10 Apr 2024 07:24:59 GMT
- Title: A General Theory for Kernel Packets: from state space model to compactly supported basis
- Authors: Liang Ding, Rui Tuo,
- Abstract summary: We prove that an $m$-dimensional SS model formulation of GP is equivalent to a concept we introduce as the general right Kernel Packet (KP)
KPs improve GP prediction time to $mathcalO(log n)$ or $mathcalO(1)$, enable broader applications including GP's derivatives and kernel multiplications, and can be generalized to multi-dimensional additive and product kernels for scattered data.
- Score: 16.235214685688227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to $\CalO(n)$ for $n$ data points. We prove that an $m$-dimensional SS model formulation of GP is equivalent to a concept we introduce as the general right Kernel Packet (KP): a transformation for the GP covariance $K$ such that $\sum_{i=0}^{m}a_iD_t^{(j)}K(t,t_i)=0$ holds for any $t \leq t_1$, 0 $\leq j \leq m-1$, and $m+1$ consecutive points $t_i$, where ${D}_t^{(j)}f(t) $ denotes $j$-th derivative acting on $t$. We extend this idea to the backward SS model formulation, leading to the left KP for next $m$ consecutive points: $\sum_{i=0}^{m}b_i{D}_t^{(j)}K(t,t_{m+i})=0$ for any $t\geq t_{2m}$. By combining both left and right KPs, we can prove that a suitable linear combination of these covariance functions yields $m$ KP functions compactly supported on $(t_0,t_{2m})$. KPs improve GP prediction time to $\mathcal{O}(\log n)$ or $\mathcal{O}(1)$, enable broader applications including GP's derivatives and kernel multiplications, and can be generalized to multi-dimensional additive and product kernels for scattered data.
Related papers
- Further Understanding of a Local Gaussian Process Approximation: Characterising Convergence in the Finite Regime [1.3518297878940662]
We show that common choices of kernel functions for a highly accurate and massively scalable GPnn regression model exhibit gradual convergence to behaviour as dataset-size $n$ increases.
Similar bounds can be found under model misspecification and combined to give overall rates of convergence of both MSE and an important calibration metric.
arXiv Detail & Related papers (2024-04-09T10:47:01Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Kernel Thinning [26.25415159542831]
kernel thinning is a new procedure for compressing a distribution $mathbbP$ more effectively than i.i.d. sampling or standard thinning.
We derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat'ern, and B-spline kernels.
arXiv Detail & Related papers (2021-05-12T17:56:42Z) - Convergence of Graph Laplacian with kNN Self-tuned Kernels [14.645468999921961]
Self-tuned kernel adaptively sets a $sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance.
This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels.
arXiv Detail & Related papers (2020-11-03T04:55:33Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Curse of Dimensionality on Randomized Smoothing for Certifiable
Robustness [151.67113334248464]
We show that extending the smoothing technique to defend against other attack models can be challenging.
We present experimental results on CIFAR to validate our theory.
arXiv Detail & Related papers (2020-02-08T22:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.