Related papers: The Price of Linear Time: Error Analysis of Structured Kernel Interpolation

The Price of Linear Time: Error Analysis of Structured Kernel Interpolation

URL: http://arxiv.org/abs/2502.00298v2
Date: Tue, 04 Feb 2025 04:07:24 GMT
Title: The Price of Linear Time: Error Analysis of Structured Kernel Interpolation
Authors: Alexander Moreno, Justin Xiao, Jonathan Mei,
Abstract summary: Structured Kernel Interpolation (SKI) helps scale Gaussian Processes (GPs) by approximating the kernel matrix via inducing points, achieving linear computational complexity.<n>This paper bridges the gap: we prove error bounds for the SKI Gram matrix and examine the error's effect on hyper parameters.<n>We identify two dimensionality regimes governing the trade-off between SKI Gram matrix spectral norm error and computational complexity.
Score: 46.342033870324705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structured Kernel Interpolation (SKI) (Wilson et al. 2015) helps scale Gaussian Processes (GPs) by approximating the kernel matrix via interpolation at inducing points, achieving linear computational complexity. However, it lacks rigorous theoretical error analysis. This paper bridges the gap: we prove error bounds for the SKI Gram matrix and examine the error's effect on hyperparameter estimation and posterior inference. We further provide a practical guide to selecting the number of inducing points under convolutional cubic interpolation: they should grow as $n^{d/3}$ for error control. Crucially, we identify two dimensionality regimes governing the trade-off between SKI Gram matrix spectral norm error and computational complexity. For $d \leq 3$, any error tolerance can achieve linear time for sufficiently large sample size. For $d > 3$, the error must increase with sample size to maintain linear time. Our analysis provides key insights into SKI's scalability-accuracy trade-offs, establishing precise conditions for achieving linear-time GP inference with controlled approximation error.

Related papers

Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure [16.319561844942886]
We show that our GP model can match the performance of a Transformer on a learning curve prediction task. Our method only requires $mathcalO(n3 + m3)$ time and $mathcalO(n2 + m2)$ space.
arXiv Detail & Related papers (2024-10-11T20:24:33Z)
Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases. A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
Sparse Cholesky Factorization for Solving Nonlinear PDEs via Gaussian Processes [3.750429354590631]
We present a sparse Cholesky factorization algorithm for dense kernel matrices. We numerically illustrate our algorithm's near-linear space/time complexity for a broad class of nonlinear PDEs.
arXiv Detail & Related papers (2023-04-03T18:35:28Z)
Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression [41.48538038768993]
We focus on the problem of kernel ridge regression for dot-product kernels. We observe a peak in the learning curve whenever $m approx dr/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
arXiv Detail & Related papers (2022-05-30T04:21:31Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$. We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z)
Faster Kernel Interpolation for Gaussian Processes [30.04235162264955]
Key challenge in scaling Process (GP) regression to massive datasets is that exact inference requires a dense n x n kernel matrix. Structured kernel (SKI) is among the most scalable methods. We show that SKI can be reduced to O(m log m) after a single O(n) time precomputation step. We demonstrate speedups in practice for a wide range of m and n and apply the method to GP inference on a three-dimensional weather radar dataset with over 100 million points.
arXiv Detail & Related papers (2021-01-28T00:09:22Z)
Inexact and Stochastic Generalized Conditional Gradient with Augmented Lagrangian and Proximal Step [2.0196229393131726]
We analyze inexact and versions of the CGALP algorithm developed in the authors' previous paper. This allows one to compute some gradients, terms, and/or linear minimization oracles in an inexact fashion. We show convergence of the Lagrangian to an optimum and feasibility of the affine constraint.
arXiv Detail & Related papers (2020-05-11T14:52:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.