On The Relative Error of Random Fourier Features for Preserving Kernel
Distance
- URL: http://arxiv.org/abs/2210.00244v2
- Date: Thu, 13 Apr 2023 13:57:13 GMT
- Title: On The Relative Error of Random Fourier Features for Preserving Kernel
Distance
- Authors: Kuan Cheng, Shaofeng H.-C. Jiang, Luojian Wei, Zhide Wei
- Abstract summary: We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions.
We make the first step towards data-oblivious dimension-reduction for general shift-invariant kernels, and we obtain a similar $mathrmpoly(epsilon-1 log n)$ dimension bound for Laplacian kernels.
- Score: 7.383448514243228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The method of random Fourier features (RFF), proposed in a seminal paper by
Rahimi and Recht (NIPS'07), is a powerful technique to find approximate
low-dimensional representations of points in (high-dimensional) kernel space,
for shift-invariant kernels. While RFF has been analyzed under various notions
of error guarantee, the ability to preserve the kernel distance with
\emph{relative} error is less understood. We show that for a significant range
of kernels, including the well-known Laplacian kernels, RFF cannot approximate
the kernel distance with small relative error using low dimensions. We
complement this by showing as long as the shift-invariant kernel is analytic,
RFF with $\mathrm{poly}(\epsilon^{-1} \log n)$ dimensions achieves
$\epsilon$-relative error for pairwise kernel distance of $n$ points, and the
dimension bound is improved to $\mathrm{poly}(\epsilon^{-1}\log k)$ for the
specific application of kernel $k$-means. Finally, going beyond RFF, we make
the first step towards data-oblivious dimension-reduction for general
shift-invariant kernels, and we obtain a similar $\mathrm{poly}(\epsilon^{-1}
\log n)$ dimension bound for Laplacian kernels. We also validate the
dimension-error tradeoff of our methods on simulated datasets, and they
demonstrate superior performance compared with other popular methods including
random-projection and Nystr\"{o}m methods.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - The phase diagram of kernel interpolation in large dimensions [8.707305374058794]
generalization ability of kernel in large dimensions might be one of the most interesting problems in the recent renaissance of kernel regression.
We fully characterized the exact order of both the variance and bias of large-dimensional kernel under various source conditions $sgeq 0$.
We determined the regions in $(s,gamma)$-plane where the kernel is minimax optimal, sub-optimal and inconsistent.
arXiv Detail & Related papers (2024-04-19T03:04:06Z) - Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms [0.0]
Kernel-based methods are heavily used in machine learning.
They suffer from $O(N2)$ complexity in the number $N$ of considered data points.
We propose an approximation procedure, which reduces this complexity to $O(N)$.
arXiv Detail & Related papers (2024-01-16T10:31:27Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Uniform Approximations for Randomized Hadamard Transforms with
Applications [8.985261743452986]
Randomized Hadamard Transforms (RHTs) have emerged as a computationally efficient alternative to the use of dense unstructured random matrices.
We use our inequality to derive improved guarantees for two applications in the high-dimensional regime: kernel approximation and distance estimation.
arXiv Detail & Related papers (2022-03-03T09:56:39Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z) - Fourier Sparse Leverage Scores and Approximate Kernel Learning [29.104055676527913]
We prove new explicit upper bounds on the leverage scores of Fourier functions under both the Gaussian and Laplace measures.
Our bounds are motivated by two important applications in machine learning.
arXiv Detail & Related papers (2020-06-12T17:25:39Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.