The Loss Surfaces of Neural Networks with General Activation Functions
- URL: http://arxiv.org/abs/2004.03959v3
- Date: Tue, 8 Jun 2021 07:08:49 GMT
- Title: The Loss Surfaces of Neural Networks with General Activation Functions
- Authors: Nicholas P. Baskerville, Jonathan P. Keating, Francesco Mezzadri,
Joseph Najnudel
- Abstract summary: We chart a new path through the spin glass complexity calculations using supersymmetric methods in Random Matrix Theory.
Our results shed new light on both the strengths and the weaknesses of spin glass models in this context.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The loss surfaces of deep neural networks have been the subject of several
studies, theoretical and experimental, over the last few years. One strand of
work considers the complexity, in the sense of local optima, of high
dimensional random functions with the aim of informing how local optimisation
methods may perform in such complicated settings. Prior work of Choromanska et
al (2015) established a direct link between the training loss surfaces of deep
multi-layer perceptron networks and spherical multi-spin glass models under
some very strong assumptions on the network and its data. In this work, we test
the validity of this approach by removing the undesirable restriction to ReLU
activation functions. In doing so, we chart a new path through the spin glass
complexity calculations using supersymmetric methods in Random Matrix Theory
which may prove useful in other contexts. Our results shed new light on both
the strengths and the weaknesses of spin glass models in this context.
Related papers
- Improving Generalization of Deep Neural Networks by Optimum Shifting [33.092571599896814]
We propose a novel method called emphoptimum shifting, which changes the parameters of a neural network from a sharp minimum to a flatter one.
Our method is based on the observation that when the input and output of a neural network are fixed, the matrix multiplications within the network can be treated as systems of under-determined linear equations.
arXiv Detail & Related papers (2024-05-23T02:31:55Z) - A singular Riemannian Geometry Approach to Deep Neural Networks III. Piecewise Differentiable Layers and Random Walks on $n$-dimensional Classes [49.32130498861987]
We study the case of non-differentiable activation functions, such as ReLU.
Two recent works introduced a geometric framework to study neural networks.
We illustrate our findings with some numerical experiments on classification of images and thermodynamic problems.
arXiv Detail & Related papers (2024-04-09T08:11:46Z) - From NeurODEs to AutoencODEs: a mean-field control framework for
width-varying Neural Networks [68.8204255655161]
We propose a new type of continuous-time control system, called AutoencODE, based on a controlled field that drives dynamics.
We show that many architectures can be recovered in regions where the loss function is locally convex.
arXiv Detail & Related papers (2023-07-05T13:26:17Z) - When Deep Learning Meets Polyhedral Theory: A Survey [6.899761345257773]
In the past decade, deep became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural learning.
Meanwhile, the structure of neural networks converged back to simplerwise and linear functions.
arXiv Detail & Related papers (2023-04-29T11:46:53Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Revisit Geophysical Imaging in A New View of Physics-informed Generative
Adversarial Learning [2.12121796606941]
Full waveform inversion produces high-resolution subsurface models.
FWI with least-squares function suffers from many drawbacks such as the local-minima problem.
Recent works relying on partial differential equations and neural networks show promising performance for two-dimensional FWI.
We propose an unsupervised learning paradigm that integrates wave equation with a discriminate network to accurately estimate the physically consistent models.
arXiv Detail & Related papers (2021-09-23T15:54:40Z) - Towards Understanding Theoretical Advantages of Complex-Reaction
Networks [77.34726150561087]
We show that a class of functions can be approximated by a complex-reaction network using the number of parameters.
For empirical risk minimization, our theoretical result shows that the critical point set of complex-reaction networks is a proper subset of that of real-valued networks.
arXiv Detail & Related papers (2021-08-15T10:13:49Z) - LocalDrop: A Hybrid Regularization for Deep Neural Networks [98.30782118441158]
We propose a new approach for the regularization of neural networks by the local Rademacher complexity called LocalDrop.
A new regularization function for both fully-connected networks (FCNs) and convolutional neural networks (CNNs) has been developed based on the proposed upper bound of the local Rademacher complexity.
arXiv Detail & Related papers (2021-03-01T03:10:11Z) - Applicability of Random Matrix Theory in Deep Learning [0.966840768820136]
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks.
Our results shed new light on the applicability of Random Matrix Theory to modelling neural networks.
We propose a novel model for the true loss surfaces of neural networks.
arXiv Detail & Related papers (2021-02-12T19:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.