Universal characteristics of deep neural network loss surfaces from
random matrix theory
- URL: http://arxiv.org/abs/2205.08601v1
- Date: Tue, 17 May 2022 19:42:23 GMT
- Title: Universal characteristics of deep neural network loss surfaces from
random matrix theory
- Authors: Nicholas P Baskerville, Jonathan P Keating, Francesco Mezzadri, Joseph
Najnudel, Diego Granziol
- Abstract summary: We use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks.
In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms.
- Score: 0.5249805590164901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper considers several aspects of random matrix universality in deep
neural networks. Motivated by recent experimental work, we use universal
properties of random matrices related to local statistics to derive practical
implications for deep neural networks based on a realistic model of their
Hessians. In particular we derive universal aspects of outliers in the spectra
of deep neural networks and demonstrate the important role of random matrix
local laws in popular pre-conditioning gradient descent algorithms. We also
present insights into deep neural network loss surfaces from quite general
arguments based on tools from statistical physics and random matrix theory.
Related papers
- Implicit Regularization via Spectral Neural Networks and Non-linear
Matrix Sensing [2.171120568435925]
Spectral Neural Networks (abbrv. SNN) is particularly suitable for matrix learning problems.
We show that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets.
We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
arXiv Detail & Related papers (2024-02-27T15:28:01Z) - Random matrix theory and the loss surfaces of neural networks [0.0]
We use random matrix theory to understand and describe the loss surfaces of large neural networks.
We derive powerful and novel results about the Hessians of neural network loss surfaces and their spectra.
This thesis provides important contributions to cement the place of random matrix theory in the theoretical study of modern neural networks.
arXiv Detail & Related papers (2023-06-03T13:16:17Z) - Norm-based Generalization Bounds for Compositionally Sparse Neural
Networks [11.987589603961622]
We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks.
Taken together, these results suggest that compositional sparsity of the underlying target function is critical to the success of deep neural networks.
arXiv Detail & Related papers (2023-01-28T00:06:22Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Statistical Guarantees for Approximate Stationary Points of Simple
Neural Networks [4.254099382808598]
We develop statistical guarantees for simple neural networks that coincide up to logarithmic factors with the global optima.
We make a step forward in describing the practical properties of neural networks in mathematical terms.
arXiv Detail & Related papers (2022-05-09T18:09:04Z) - More Than a Toy: Random Matrix Models Predict How Real-World Neural
Representations Generalize [94.70343385404203]
We find that most theoretical analyses fall short of capturing qualitative phenomena even for kernel regression.
We prove that the classical GCV estimator converges to the generalization risk whenever a local random matrix law holds.
Our findings suggest that random matrix theory may be central to understanding the properties of neural representations in practice.
arXiv Detail & Related papers (2022-03-11T18:59:01Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Applicability of Random Matrix Theory in Deep Learning [0.966840768820136]
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks.
Our results shed new light on the applicability of Random Matrix Theory to modelling neural networks.
We propose a novel model for the true loss surfaces of neural networks.
arXiv Detail & Related papers (2021-02-12T19:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.