A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent
- URL: http://arxiv.org/abs/2006.05013v2
- Date: Sat, 19 Dec 2020 06:28:07 GMT
- Title: A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian
Kernel, a Precise Phase Transition, and the Corresponding Double Descent
- Authors: Zhenyu Liao, Romain Couillet, Michael W. Mahoney
- Abstract summary: This article characterizes the exacts of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$ is all large and comparable.
This analysis also provides accurate estimates of training and test regression errors for large $n,p,N$.
- Score: 85.77233010209368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article characterizes the exact asymptotics of random Fourier feature
(RFF) regression, in the realistic setting where the number of data samples
$n$, their dimension $p$, and the dimension of feature space $N$ are all large
and comparable. In this regime, the random RFF Gram matrix no longer converges
to the well-known limiting Gaussian kernel matrix (as it does when $N \to
\infty$ alone), but it still has a tractable behavior that is captured by our
analysis. This analysis also provides accurate estimates of training and test
regression errors for large $n,p,N$. Based on these estimates, a precise
characterization of two qualitatively different phases of learning, including
the phase transition between them, is provided; and the corresponding double
descent test error curve is derived from this phase transition behavior. These
results do not depend on strong assumptions on the data distribution, and they
perfectly match empirical results on real-world data sets.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Analysis of the expected $L_2$ error of an over-parametrized deep neural
network estimate learned by gradient descent without regularization [7.977229957867868]
Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L$ risk are universally consistent.
In this paper, we show that the regularization term is not necessary to obtain similar results.
arXiv Detail & Related papers (2023-11-24T17:04:21Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Precise Learning Curves and Higher-Order Scaling Limits for Dot Product
Kernel Regression [41.48538038768993]
We focus on the problem of kernel ridge regression for dot-product kernels.
We observe a peak in the learning curve whenever $m approx dr/r!$ for any integer $r$, leading to multiple sample-wise descent and nontrivial behavior at multiple scales.
arXiv Detail & Related papers (2022-05-30T04:21:31Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Convergence Rates of Stochastic Gradient Descent under Infinite Noise
Variance [14.06947898164194]
Heavy tails emerge in gradient descent (SGD) in various scenarios.
We provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance.
Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum.
arXiv Detail & Related papers (2021-02-20T13:45:11Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Tight Nonparametric Convergence Rates for Stochastic Gradient Descent
under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model.
As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z) - Phase retrieval in high dimensions: Statistical and computational phase
transitions [27.437775143419987]
We consider the problem of reconstructing a $mathbfXstar$ from $m$ (possibly noisy) observations.
In particular, the information-theoretic transition to perfect recovery for full-rank matrices appears at $alpha=1$ and $alpha=2$.
Our work provides an extensive classification of the statistical and algorithmic thresholds in high-dimensional phase retrieval.
arXiv Detail & Related papers (2020-06-09T13:03:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.