Random matrix analysis of deep neural network weight matrices
- URL: http://arxiv.org/abs/2203.14661v1
- Date: Mon, 28 Mar 2022 11:22:12 GMT
- Title: Random matrix analysis of deep neural network weight matrices
- Authors: Matthias Thamm, Max Staats, Bernd Rosenow
- Abstract summary: We study the weight matrices of trained deep neural networks using methods from random matrix theory (RMT)
We show that the statistics of most of the singular values follow universal RMT predictions.
This suggests that they are random and do not contain system specific information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks have been used successfully in a variety of fields, which has
led to a great deal of interest in developing a theoretical understanding of
how they store the information needed to perform a particular task. We study
the weight matrices of trained deep neural networks using methods from random
matrix theory (RMT) and show that the statistics of most of the singular values
follow universal RMT predictions. This suggests that they are random and do not
contain system specific information, which we investigate further by comparing
the statistics of eigenvector entries to the universal Porter-Thomas
distribution. We find that for most eigenvectors the hypothesis of randomness
cannot be rejected, and that only eigenvectors belonging to the largest
singular values deviate from the RMT prediction, indicating that they may
encode learned information. We analyze the spectral distribution of such large
singular values using the Hill estimator and find that the distribution cannot
be characterized by a tail index, i.e. is not of power law type.
Related papers
- Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Nonlinear spiked covariance matrices and signal propagation in deep
neural networks [22.84097371842279]
We study the eigenvalue spectrum of the Conjugate Kernel defined by a nonlinear feature map of a feedforward neural network.
In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model.
We also study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training.
arXiv Detail & Related papers (2024-02-15T17:31:19Z) - Trade-Offs of Diagonal Fisher Information Matrix Estimators [53.35448232352667]
The Fisher information matrix can be used to characterize the local geometry of the parameter space of neural networks.
We examine two popular estimators whose accuracy and sample complexity depend on their associated variances.
We derive bounds of the variances and instantiate them in neural networks for regression and classification.
arXiv Detail & Related papers (2024-02-08T03:29:10Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - On some theoretical limitations of Generative Adversarial Networks [77.34726150561087]
It is a general assumption that GANs can generate any probability distribution.
We provide a new result based on Extreme Value Theory showing that GANs can't generate heavy tailed distributions.
arXiv Detail & Related papers (2021-10-21T06:10:38Z) - Geometry and Generalization: Eigenvalues as predictors of where a
network will fail to generalize [0.30586855806896046]
We study the deformation of the input space by a trained autoencoder via the Jacobians of the trained weight matrices.
This is a dataset independent means of testing an autoencoder's ability to generalize on new input.
arXiv Detail & Related papers (2021-07-13T21:03:42Z) - Minimax Estimation of Linear Functions of Eigenvectors in the Face of
Small Eigen-Gaps [95.62172085878132]
Eigenvector perturbation analysis plays a vital role in various statistical data science applications.
We develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector.
In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators.
arXiv Detail & Related papers (2021-04-07T17:55:10Z) - A simpler spectral approach for clustering in directed networks [1.52292571922932]
We show that using the eigenvalue/eigenvector decomposition of the adjacency matrix is simpler than all common methods.
We provide numerical evidence for the superiority of the Gaussian Mixture clustering over the widely used k-means algorithm.
arXiv Detail & Related papers (2021-02-05T14:16:45Z) - On Random Matrices Arising in Deep Neural Networks: General I.I.D. Case [0.0]
We study the distribution of singular values of product of random matrices pertinent to the analysis of deep neural networks.
We use another, more streamlined, version of the techniques of random matrix theory to generalize the results of [22] to the case where the entries of the synaptic weight matrices are just independent identically distributed random variables with zero mean and finite fourth moment.
arXiv Detail & Related papers (2020-11-20T14:39:24Z) - On Random Matrices Arising in Deep Neural Networks. Gaussian Case [1.6244541005112747]
The paper deals with distribution of singular values of product of random matrices arising in the analysis of deep neural networks.
The problem has been considered in recent work by using the techniques of free probability theory.
arXiv Detail & Related papers (2020-01-17T08:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.