Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural
Networks
- URL: http://arxiv.org/abs/2201.04753v1
- Date: Thu, 13 Jan 2022 00:48:20 GMT
- Title: Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural
Networks
- Authors: Lucas Benigni, Sandrine P\'ech\'e
- Abstract summary: We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles.
This may be of interest for applications to machine learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper is concerned with the asymptotic distribution of the largest
eigenvalues for some nonlinear random matrix ensemble stemming from the study
of neural networks. More precisely we consider $M= \frac{1}{m} YY^\top$ with
$Y=f(WX)$ where $W$ and $X$ are random rectangular matrices with i.i.d.
centered entries. This models the data covariance matrix or the Conjugate
Kernel of a single layered random Feed-Forward Neural Network. The function $f$
is applied entrywise and can be seen as the activation function of the neural
network. We show that the largest eigenvalue has the same limit (in
probability) as that of some well-known linear random matrix ensembles. In
particular, we relate the asymptotic limit of the largest eigenvalue for the
nonlinear model to that of an information-plus-noise random matrix,
establishing a possible phase transition depending on the function $f$ and the
distribution of $W$ and $X$. This may be of interest for applications to
machine learning.
Related papers
- Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - Private Covariance Approximation and Eigenvalue-Gap Bounds for Complex
Gaussian Perturbations [28.431572772564518]
We show that the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$ is bounded by roughly $tildeO(sqrtkd)$.
This improves on previous work that requires that the gap between every pair of top-$k$ eigenvalues of $M$ is at least $sqrtd$ for a similar bound.
arXiv Detail & Related papers (2023-06-29T03:18:53Z) - Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis.
We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$.
Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z) - An Equivalence Principle for the Spectrum of Random Inner-Product Kernel
Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics.
We establish the weak limit of the empirical distribution of these random matrices in a scaling regime.
Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Deformed semicircle law and concentration of nonlinear random matrices
for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$.
We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z) - Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method [0.0]
Motivated by random neural networks, we consider the random matrix $M = Y Yast$ with $Y = f(WX)$.
We prove that the Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms.
In addition, we extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix.
arXiv Detail & Related papers (2021-05-11T15:17:39Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.