Asymptotic Analysis of Deep Residual Networks
- URL: http://arxiv.org/abs/2212.08199v1
- Date: Thu, 15 Dec 2022 23:55:01 GMT
- Title: Asymptotic Analysis of Deep Residual Networks
- Authors: Rama Cont, Alain Rossier, and Renyuan Xu
- Abstract summary: We investigate the properties of deep Residual networks (ResNets) as the number of layers increases.
We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature.
We study the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a convergence equation (SDE) or neither of these.
- Score: 6.308539010172309
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the asymptotic properties of deep Residual networks (ResNets)
as the number of layers increases. We first show the existence of scaling
regimes for trained weights markedly different from those implicitly assumed in
the neural ODE literature. We study the convergence of the hidden state
dynamics in these scaling regimes, showing that one may obtain an ODE, a
stochastic differential equation (SDE) or neither of these. In particular, our
findings point to the existence of a diffusive regime in which the deep network
limit is described by a class of stochastic differential equations (SDEs).
Finally, we derive the corresponding scaling limits for the backpropagation
dynamics.
Related papers
- Theory on variational high-dimensional tensor networks [2.0307382542339485]
We investigate the emergent statistical properties of random high-dimensional-network states and the trainability of tensoral networks.
We prove that variational high-dimensional networks suffer from barren plateaus for global loss functions.
Our results pave a way for their future theoretical studies and practical applications.
arXiv Detail & Related papers (2023-03-30T15:26:30Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - From high-dimensional & mean-field dynamics to dimensionless ODEs: A
unifying approach to SGD in two-layers networks [26.65398696336828]
This manuscript investigates the one-pass gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels.
We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk.
arXiv Detail & Related papers (2023-02-12T09:50:52Z) - Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks [12.355137704908042]
We show restrained numerical instabilities in current training practices of deep networks with gradient descent (SGD)
We do this by presenting a theoretical framework using numerical analysis of partial differential equations (PDE), and analyzing the gradient descent PDE of convolutional neural networks (CNNs)
We show this is a consequence of the non-linear PDE associated with the descent of the CNN, whose local linearization changes when over-driving the step size of the discretization resulting in a stabilizing effect.
arXiv Detail & Related papers (2022-06-04T14:54:05Z) - Decimation technique for open quantum systems: a case study with
driven-dissipative bosonic chains [62.997667081978825]
Unavoidable coupling of quantum systems to external degrees of freedom leads to dissipative (non-unitary) dynamics.
We introduce a method to deal with these systems based on the calculation of (dissipative) lattice Green's function.
We illustrate the power of this method with several examples of driven-dissipative bosonic chains of increasing complexity.
arXiv Detail & Related papers (2022-02-15T19:00:09Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Scaling Properties of Deep Residual Networks [2.6763498831034043]
We investigate the properties of weights trained by gradient descent and their scaling with network depth through numerical experiments.
We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature.
These findings cast doubts on the validity of the neural ODE model as an adequate description of deep ResNets.
arXiv Detail & Related papers (2021-05-25T22:31:30Z) - Quantitative Propagation of Chaos for SGD in Wide Neural Networks [39.35545193410871]
In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Gradient Descent (SGD)
We show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios.
We identify two under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand.
arXiv Detail & Related papers (2020-07-13T12:55:21Z) - Stochasticity in Neural ODEs: An Empirical Study [68.8204255655161]
Regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization.
We show that data augmentation during the training improves the performance of both deterministic and versions of the same model.
However, the improvements obtained by the data augmentation completely eliminate the empirical regularization gains, making the performance of neural ODE and neural SDE negligible.
arXiv Detail & Related papers (2020-02-22T22:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.