Fourier Sparse Leverage Scores and Approximate Kernel Learning
- URL: http://arxiv.org/abs/2006.07340v3
- Date: Thu, 8 Jul 2021 01:55:27 GMT
- Title: Fourier Sparse Leverage Scores and Approximate Kernel Learning
- Authors: Tam\'as Erd\'elyi and Cameron Musco and Christopher Musco
- Abstract summary: We prove new explicit upper bounds on the leverage scores of Fourier functions under both the Gaussian and Laplace measures.
Our bounds are motivated by two important applications in machine learning.
- Score: 29.104055676527913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove new explicit upper bounds on the leverage scores of Fourier sparse
functions under both the Gaussian and Laplace measures. In particular, we study
$s$-sparse functions of the form $f(x) = \sum_{j=1}^s a_j e^{i \lambda_j x}$
for coefficients $a_j \in \mathbb{C}$ and frequencies $\lambda_j \in
\mathbb{R}$. Bounding Fourier sparse leverage scores under various measures is
of pure mathematical interest in approximation theory, and our work extends
existing results for the uniform measure [Erd17,CP19a]. Practically, our bounds
are motivated by two important applications in machine learning:
1. Kernel Approximation. They yield a new random Fourier features algorithm
for approximating Gaussian and Cauchy (rational quadratic) kernel matrices. For
low-dimensional data, our method uses a near optimal number of features, and
its runtime is polynomial in the $statistical\ dimension$ of the approximated
kernel matrix. It is the first "oblivious sketching method" with this property
for any kernel besides the polynomial kernel, resolving an open question of
[AKM+17,AKK+20b].
2. Active Learning. They can be used as non-uniform sampling distributions
for robust active learning when data follows a Gaussian or Laplace
distribution. Using the framework of [AKM+19], we provide essentially optimal
results for bandlimited and multiband interpolation, and Gaussian process
regression. These results generalize existing work that only applies to
uniformly distributed data.
Related papers
- Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - Dimensionality Reduction for General KDE Mode Finding [12.779486428760373]
Finding the mode of a high dimensional probability distribution $D$ is a fundamental problem in statistics and data analysis.
We show that there is no time algorithm for finding the mode of a kernel density estimate, unless $mathitP = mathitNP$.
arXiv Detail & Related papers (2023-05-30T05:39:59Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Kernel Packet: An Exact and Scalable Algorithm for Gaussian Process
Regression with Mat\'ern Correlations [23.560067934682294]
We develop an exact and scalable algorithm for one-dimensional Gaussian process regression with Mat'ern correlations.
The proposed algorithm is significantly superior to the existing alternatives in both the computational time and predictive accuracy.
arXiv Detail & Related papers (2022-03-07T03:30:35Z) - Uniform Approximations for Randomized Hadamard Transforms with
Applications [8.985261743452986]
Randomized Hadamard Transforms (RHTs) have emerged as a computationally efficient alternative to the use of dense unstructured random matrices.
We use our inequality to derive improved guarantees for two applications in the high-dimensional regime: kernel approximation and distance estimation.
arXiv Detail & Related papers (2022-03-03T09:56:39Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Quantization Algorithms for Random Fourier Features [25.356048456005023]
The method of random Fourier features (RFF) has also become popular, for approximating the Gaussian kernel.
RFF applies a specific nonlinear transformation on the projected data from random projections.
In this paper, we focus on developing quantization algorithms for RFF.
arXiv Detail & Related papers (2021-02-25T18:51:39Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.