Random matrix theory and the loss surfaces of neural networks
- URL: http://arxiv.org/abs/2306.02108v1
- Date: Sat, 3 Jun 2023 13:16:17 GMT
- Title: Random matrix theory and the loss surfaces of neural networks
- Authors: Nicholas P Baskerville
- Abstract summary: We use random matrix theory to understand and describe the loss surfaces of large neural networks.
We derive powerful and novel results about the Hessians of neural network loss surfaces and their spectra.
This thesis provides important contributions to cement the place of random matrix theory in the theoretical study of modern neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network models are one of the most successful approaches to machine
learning, enjoying an enormous amount of development and research over recent
years and finding concrete real-world applications in almost any conceivable
area of science, engineering and modern life in general. The theoretical
understanding of neural networks trails significantly behind their practical
success and the engineering heuristics that have grown up around them. Random
matrix theory provides a rich framework of tools with which aspects of neural
network phenomenology can be explored theoretically. In this thesis, we
establish significant extensions of prior work using random matrix theory to
understand and describe the loss surfaces of large neural networks,
particularly generalising to different architectures. Informed by the
historical applications of random matrix theory in physics and elsewhere, we
establish the presence of local random matrix universality in real neural
networks and then utilise this as a modeling assumption to derive powerful and
novel results about the Hessians of neural network loss surfaces and their
spectra. In addition to these major contributions, we make use of random matrix
models for neural network loss surfaces to shed light on modern neural network
training approaches and even to derive a novel and effective variant of a
popular optimisation algorithm.
Overall, this thesis provides important contributions to cement the place of
random matrix theory in the theoretical study of modern neural networks,
reveals some of the limits of existing approaches and begins the study of an
entirely new role for random matrix theory in the theory of deep learning with
important experimental discoveries and novel theoretical results based on local
random matrix universality.
Related papers
- Reasoning Algorithmically in Graph Neural Networks [1.8130068086063336]
We aim to integrate the structured and rule-based reasoning of algorithms with adaptive learning capabilities of neural networks.
This dissertation provides theoretical and practical contributions to this area of research.
arXiv Detail & Related papers (2024-02-21T12:16:51Z) - Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences.
It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations.
Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z) - Riemannian Residual Neural Networks [58.925132597945634]
We show how to extend the residual neural network (ResNet)
ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks.
arXiv Detail & Related papers (2023-10-16T02:12:32Z) - Deep Learning Meets Sparse Regularization: A Signal Processing
Perspective [17.12783792226575]
We present a mathematical framework that characterizes the functional properties of neural networks that are trained to fit to data.
Key mathematical tools which support this framework include transform-domain sparse regularization, the Radon transform of computed tomography, and approximation theory.
This framework explains the effect of weight decay regularization in neural network training, the use of skip connections and low-rank weight matrices in network architectures, the role of sparsity in neural networks, and explains why neural networks can perform well in high-dimensional problems.
arXiv Detail & Related papers (2023-01-23T17:16:21Z) - Spiking neural network for nonlinear regression [68.8204255655161]
Spiking neural networks carry the potential for a massive reduction in memory and energy consumption.
They introduce temporal and neuronal sparsity, which can be exploited by next-generation neuromorphic hardware.
A framework for regression using spiking neural networks is proposed.
arXiv Detail & Related papers (2022-10-06T13:04:45Z) - Gaussian Process Surrogate Models for Neural Networks [6.8304779077042515]
In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque.
We construct a class of surrogate models for neural networks using Gaussian processes.
We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems.
arXiv Detail & Related papers (2022-08-11T20:17:02Z) - Universal characteristics of deep neural network loss surfaces from
random matrix theory [0.5249805590164901]
We use universal properties of random matrices related to local statistics to derive practical implications for deep neural networks.
In particular we derive universal aspects of outliers in the spectra of deep neural networks and demonstrate the important role of random matrix local laws in popular pre-conditioning gradient descent algorithms.
arXiv Detail & Related papers (2022-05-17T19:42:23Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - What can linearized neural networks actually say about generalization? [67.83999394554621]
In certain infinitely-wide neural networks, the neural tangent kernel (NTK) theory fully characterizes generalization.
We show that the linear approximations can indeed rank the learning complexity of certain tasks for neural networks.
Our work provides concrete examples of novel deep learning phenomena which can inspire future theoretical research.
arXiv Detail & Related papers (2021-06-12T13:05:11Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Applicability of Random Matrix Theory in Deep Learning [0.966840768820136]
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks.
Our results shed new light on the applicability of Random Matrix Theory to modelling neural networks.
We propose a novel model for the true loss surfaces of neural networks.
arXiv Detail & Related papers (2021-02-12T19:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.