Related papers: Fast Sketching of Polynomial Kernels of Polynomial Degree

Fast Sketching of Polynomial Kernels of Polynomial Degree

URL: http://arxiv.org/abs/2108.09420v1
Date: Sat, 21 Aug 2021 02:14:55 GMT
Title: Fast Sketching of Polynomial Kernels of Polynomial Degree
Authors: Zhao Song, David P. Woodruff, Zheng Yu, Lichen Zhang
Abstract summary: kernel is especially important as other kernels can often be approximated by the kernel via a Taylor series expansion. Recent techniques in sketching reduce the dependence in the running time on the degree oblivious $q$ of the kernel. We give a new sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term.
Score: 61.83993156683605
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Kernel methods are fundamental in machine learning, and faster algorithms for kernel approximation provide direct speedups for many core tasks in machine learning. The polynomial kernel is especially important as other kernels can often be approximated by the polynomial kernel via a Taylor series expansion. Recent techniques in oblivious sketching reduce the dependence in the running time on the degree $q$ of the polynomial kernel from exponential to polynomial, which is useful for the Gaussian kernel, for which $q$ can be chosen to be polylogarithmic. However, for more slowly growing kernels, such as the neural tangent and arc-cosine kernels, $q$ needs to be polynomial, and previous work incurs a polynomial factor slowdown in the running time. We give a new oblivious sketch which greatly improves upon this running time, by removing the dependence on $q$ in the leading order term. Combined with a novel sampling scheme, we give the fastest algorithms for approximating a large family of slow-growing kernels.

Related papers

Numerical Schemes for Signature Kernels [0.5461938536945723]
Signature kernels have emerged as a powerful tool within kernel methods for sequential data. We introduce two advanced numerical schemes that leverage representations of boundary conditions through either approximation or boundary techniques. Our algorithms can be GPU-parallelized to reduce computational complexity from quadratic to linear in the length of the input sequences.
arXiv Detail & Related papers (2025-02-12T15:04:23Z)
Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines [12.11705128358537]
We propose a new class of positive definite kernels based on the spectral truncation. We show that it is a governing factor leading to performance enhancement. We also propose a deep learning perspective to increase the representation capacity of spectral truncation kernels.
arXiv Detail & Related papers (2024-05-28T04:47:12Z)
Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives [0.5735035463793009]
We present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets.
arXiv Detail & Related papers (2024-04-26T11:50:16Z)
Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms [0.0]
Kernel-based methods are heavily used in machine learning. They suffer from $O(N2)$ complexity in the number $N$ of considered data points. We propose an approximation procedure, which reduces this complexity to $O(N)$.
arXiv Detail & Related papers (2024-01-16T10:31:27Z)
Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods. In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z)
Leverage Score Sampling for Tensor Product Matrices in Input Sparsity Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices. Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z)
Kernel Identification Through Transformers [54.3795894579111]
Kernel selection plays a central role in determining the performance of Gaussian Process (GP) models. This work addresses the challenge of constructing custom kernel functions for high-dimensional GP regression models. We introduce a novel approach named KITT: Kernel Identification Through Transformers.
arXiv Detail & Related papers (2021-06-15T14:32:38Z)
The Fast Kernel Transform [21.001203328543006]
We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications for datasets in moderate dimensions with quasilinear complexity. The FKT is easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions. We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.
arXiv Detail & Related papers (2021-06-08T16:15:47Z)
Kernel approximation on algebraic varieties [0.7614628596146599]
We show that significantly better approximations are obtainable in problems involving sparse or low-rank data. Our results are presented for smooth isotropic kernels, the predominant class of kernels used in applications.
arXiv Detail & Related papers (2021-06-04T23:42:19Z)
Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives [55.79741270235602]
We develop a hybrid solution to the development of deep learning kernels. We use the advanced polyhedral technology to automatically tune the outer loops for performance.
arXiv Detail & Related papers (2020-02-06T08:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.