Beyond Random Matrix Theory for Deep Networks
- URL: http://arxiv.org/abs/2006.07721v2
- Date: Wed, 3 Nov 2021 14:10:31 GMT
- Title: Beyond Random Matrix Theory for Deep Networks
- Authors: Diego Granziol
- Abstract summary: We investigate whether Wigner semi-circle and Marcenko-Pastur distributions, often used for deep neural network theoretical analysis, match empirically observed spectral densities.
We find that even allowing for outliers, the observed spectral shapes strongly deviate from such theoretical predictions.
We consider two new classes of matrix ensembles; random Wigner/Wishart ensemble products and percolated Wigner/Wishart ensembles, both of which better match observed spectra.
- Score: 0.7614628596146599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate whether the Wigner semi-circle and Marcenko-Pastur
distributions, often used for deep neural network theoretical analysis, match
empirically observed spectral densities. We find that even allowing for
outliers, the observed spectral shapes strongly deviate from such theoretical
predictions. This raises major questions about the usefulness of these models
in deep learning. We further show that theoretical results, such as the layered
nature of critical points, are strongly dependent on the use of the exact form
of these limiting spectral densities. We consider two new classes of matrix
ensembles; random Wigner/Wishart ensemble products and percolated
Wigner/Wishart ensembles, both of which better match observed spectra. They
also give large discrete spectral peaks at the origin, providing a theoretical
explanation for the observation that various optima can be connected by one
dimensional of low loss values. We further show that, in the case of a random
matrix product, the weight of the discrete spectral component at $0$ depends on
the ratio of the dimensions of the weight matrices.
Related papers
- Coupled unidirectional chaotic microwave graphs [0.0]
We investigate the undirected open microwave network $Gamma absorption with internal composed of two coupled directed halves.
The two-port scattering matrix of the network $Gamma$ is measured and the spectral statistics and the elastic enhancement factor of the network are evaluated.
arXiv Detail & Related papers (2024-09-05T13:00:25Z) - Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition.
A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues.
We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z) - Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction [55.57072563835959]
spectral graph neural networks are characterized by filters.
We propose an eigenvalue correction strategy that can free filters from the constraints of repeated eigenvalue inputs.
arXiv Detail & Related papers (2024-01-28T08:12:00Z) - Quantum tomography of helicity states for general scattering processes [55.2480439325792]
Quantum tomography has become an indispensable tool in order to compute the density matrix $rho$ of quantum systems in Physics.
We present the theoretical framework for reconstructing the helicity quantum initial state of a general scattering process.
arXiv Detail & Related papers (2023-10-16T21:23:42Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks [8.30897399932868]
Key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices.
We introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization.
We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.
arXiv Detail & Related papers (2023-04-06T07:50:14Z) - Curvature-informed multi-task learning for graph networks [56.155331323304]
State-of-the-art graph neural networks attempt to predict multiple properties simultaneously.
We investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning.
arXiv Detail & Related papers (2022-08-02T18:18:41Z) - Leave-one-out Singular Subspace Perturbation Analysis for Spectral
Clustering [7.342677574855651]
The singular subspaces perturbation theory is of fundamental importance in probability and statistics.
We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one.
It is well-suited for mixture models and results in a sharper and finer statistical analysis than classical perturbation bounds such as Wedin's Theorem.
arXiv Detail & Related papers (2022-05-30T05:07:09Z) - Spectral embedding and the latent geometry of multipartite networks [67.56499794542228]
Many networks are multipartite, meaning their nodes can be divided into partitions and nodes of the same partition are never connected.
This paper demonstrates that the node representations obtained via spectral embedding live near partition-specific low-dimensional subspaces of a higher-dimensional ambient space.
We propose a follow-on step after spectral embedding, to recover node representations in their intrinsic rather than ambient dimension.
arXiv Detail & Related papers (2022-02-08T15:52:03Z) - Implicit Data-Driven Regularization in Deep Neural Networks under SGD [0.0]
spectral analysis of large random matrices involved in a trained deep neural network (DNN)
We find that these spectra can be classified into three main types: Marvcenko-Pastur spectrum (MP), Marvcenko-Pastur spectrum with few bleeding outliers (MPB), and Heavy tailed spectrum (HT)
arXiv Detail & Related papers (2021-11-26T06:36:16Z) - Spectral learning of multivariate extremes [0.0]
We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes.
Our work studies the theoretical performance of spectral clustering based on a random $k-nearest neighbor graph constructed from an extremal sample.
We propose a simple consistent estimation strategy for learning the angular measure.
arXiv Detail & Related papers (2021-11-15T14:33:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.