Kernel approximation on algebraic varieties
- URL: http://arxiv.org/abs/2106.02755v1
- Date: Fri, 4 Jun 2021 23:42:19 GMT
- Title: Kernel approximation on algebraic varieties
- Authors: Jason M. Altschuler and Pablo A. Parrilo
- Abstract summary: We show that significantly better approximations are obtainable in problems involving sparse or low-rank data.
Our results are presented for smooth isotropic kernels, the predominant class of kernels used in applications.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-rank approximation of kernels is a fundamental mathematical problem with
widespread algorithmic applications. Often the kernel is restricted to an
algebraic variety, e.g., in problems involving sparse or low-rank data. We show
that significantly better approximations are obtainable in this setting: the
rank required to achieve a given error depends on the variety's dimension
rather than the ambient dimension, which is typically much larger. This is true
in both high-precision and high-dimensional regimes. Our results are presented
for smooth isotropic kernels, the predominant class of kernels used in
applications. Our main technical insight is to approximate smooth kernels by
polynomial kernels, and leverage two key properties of polynomial kernels that
hold when they are restricted to a variety. First, their ranks decrease
exponentially in the variety's co-dimension. Second, their maximum values are
governed by their values over a small set of points. Together, our results
provide a general approach for exploiting (approximate) "algebraic structure"
in datasets in order to efficiently solve large-scale data science problems.
Related papers
- The phase diagram of kernel interpolation in large dimensions [8.707305374058794]
generalization ability of kernel in large dimensions might be one of the most interesting problems in the recent renaissance of kernel regression.
We fully characterized the exact order of both the variance and bias of large-dimensional kernel under various source conditions $sgeq 0$.
We determined the regions in $(s,gamma)$-plane where the kernel is minimax optimal, sub-optimal and inconsistent.
arXiv Detail & Related papers (2024-04-19T03:04:06Z) - On the Approximation of Kernel functions [0.0]
The paper addresses approximations of the kernel itself.
For the Hilbert Gauss kernel on the unit cube, the paper establishes an upper bound of the associated eigenfunctions.
This improvement confirms low rank approximation methods such as the Nystr"om method.
arXiv Detail & Related papers (2024-03-11T13:50:07Z) - Reconstructing Kernel-based Machine Learning Force Fields with
Super-linear Convergence [0.18416014644193063]
We consider the broad class of Nystr"om-type methods to construct preconditioners.
All considered methods aim to identify a representative subset of inducing ( Kernel) columns to approximate the dominant kernel spectrum.
arXiv Detail & Related papers (2022-12-24T13:45:50Z) - Interpolation with the polynomial kernels [5.8720142291102135]
kernels are widely used in machine learning and they are one of the default choices to develop kernel-based regression models.
They are rarely used and considered in numerical analysis due to their lack of strict positive definiteness.
This paper is devoted to establish some initial results for the study of these kernels, and their related algorithms.
arXiv Detail & Related papers (2022-12-15T08:30:23Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Fast Sketching of Polynomial Kernels of Polynomial Degree [61.83993156683605]
kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion.
Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel.
We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
arXiv Detail & Related papers (2021-08-21T02:14:55Z) - How rotational invariance of common kernels prevents generalization in
high dimensions [8.508198765617196]
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings.
Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
arXiv Detail & Related papers (2021-04-09T08:27:37Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.