Related papers: Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks

Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks

URL: http://arxiv.org/abs/2201.04753v1
Date: Thu, 13 Jan 2022 00:48:20 GMT
Title: Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks
Authors: Lucas Benigni, Sandrine P\'ech\'e
Abstract summary: We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles. This may be of interest for applications to machine learning.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper is concerned with the asymptotic distribution of the largest eigenvalues for some nonlinear random matrix ensemble stemming from the study of neural networks. More precisely we consider $M= \frac{1}{m} YY^\top$ with $Y=f(WX)$ where $W$ and $X$ are random rectangular matrices with i.i.d. centered entries. This models the data covariance matrix or the Conjugate Kernel of a single layered random Feed-Forward Neural Network. The function $f$ is applied entrywise and can be seen as the activation function of the neural network. We show that the largest eigenvalue has the same limit (in probability) as that of some well-known linear random matrix ensembles. In particular, we relate the asymptotic limit of the largest eigenvalue for the nonlinear model to that of an information-plus-noise random matrix, establishing a possible phase transition depending on the function $f$ and the distribution of $W$ and $X$. This may be of interest for applications to machine learning.

Related papers

Asymptotic behavior of eigenvalues of large rank perturbations of large random matrices [0.0]
The paper is concerned with deformed Wigner random matrices.<n>The spectrum of such matrices plays a key role in rigorous underpinning of the novel pruning technique.<n>In this paper we develop analysis for the case of growing rank.
arXiv Detail & Related papers (2025-07-16T12:29:23Z)
Global law of conjugate kernel random matrices with heavy-tailed weights [1.8416014644193066]
We study the spectral behavior of the conjugate kernel random matrix $YYtop$, where $Y= f(WX)$ arises from a two-layer neural network model. We show that heavy-tailed weights induce strong correlations between the entries of $Y$, leading to richer and fundamentally different spectral behavior compared to models with light-tailed weights.
arXiv Detail & Related papers (2025-02-25T18:22:58Z)
Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime. We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix. We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z)
Private Covariance Approximation and Eigenvalue-Gap Bounds for Complex Gaussian Perturbations [28.431572772564518]
We show that the Frobenius norm of the difference between the matrix output by this mechanism and the best rank-$k$ approximation to $M$ is bounded by roughly $tildeO(sqrtkd)$. This improves on previous work that requires that the gap between every pair of top-$k$ eigenvalues of $M$ is at least $sqrtd$ for a similar bound.
arXiv Detail & Related papers (2023-06-29T03:18:53Z)
Near-optimal fitting of ellipsoids to random points [68.12685213894112]
A basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some $n = Omega(, d2/mathrmpolylog(d),)$. Our proof demonstrates feasibility of the least squares construction of Saunderson et al. using a convenient decomposition of a certain non-standard random matrix.
arXiv Detail & Related papers (2022-08-19T18:00:34Z)
An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings [21.727073594338297]
This study is motivated by applications in machine learning and statistics. We establish the weak limit of the empirical distribution of these random matrices in a scaling regime. Our results can be characterized as the free additive convolution between a Marchenko-Pastur law and a semicircle law.
arXiv Detail & Related papers (2022-05-12T18:50:21Z)
When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise. We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z)
Random matrices in service of ML footprint: ternary random features with no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$. We propose a novel random technique, called Ternary Random Feature (TRF) The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z)
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks [29.03095282348978]
We study the limiting spectral distributions of two empirical kernel matrices associated with $f(X)$. We show that random feature regression induced by the empirical kernel achieves the same performance as its limiting kernel regression under the ultra-wide regime.
arXiv Detail & Related papers (2021-09-20T05:25:52Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Analysis of One-Hidden-Layer Neural Networks via the Resolvent Method [0.0]
Motivated by random neural networks, we consider the random matrix $M = Y Yast$ with $Y = f(WX)$. We prove that the Stieltjes transform of the limiting spectral distribution satisfies a quartic self-consistent equation up to some error terms. In addition, we extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix.
arXiv Detail & Related papers (2021-05-11T15:17:39Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples. We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.