Scaling Properties of Deep Residual Networks
- URL: http://arxiv.org/abs/2105.12245v1
- Date: Tue, 25 May 2021 22:31:30 GMT
- Title: Scaling Properties of Deep Residual Networks
- Authors: Alain-Sam Cohen, Rama Cont, Alain Rossier, Renyuan Xu
- Abstract summary: We investigate the properties of weights trained by gradient descent and their scaling with network depth through numerical experiments.
We observe the existence of scaling regimes markedly different from those assumed in neural ODE literature.
These findings cast doubts on the validity of the neural ODE model as an adequate description of deep ResNets.
- Score: 2.6763498831034043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Residual networks (ResNets) have displayed impressive results in pattern
recognition and, recently, have garnered considerable theoretical interest due
to a perceived link with neural ordinary differential equations (neural ODEs).
This link relies on the convergence of network weights to a smooth function as
the number of layers increases. We investigate the properties of weights
trained by stochastic gradient descent and their scaling with network depth
through detailed numerical experiments. We observe the existence of scaling
regimes markedly different from those assumed in neural ODE literature.
Depending on certain features of the network architecture, such as the
smoothness of the activation function, one may obtain an alternative ODE limit,
a stochastic differential equation or neither of these. These findings cast
doubts on the validity of the neural ODE model as an adequate asymptotic
description of deep ResNets and point to an alternative class of differential
equations as a better description of the deep network limit.
Related papers
- Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis [5.016205338484259]
The proposed method is more robust to network size variations than the existing method.
When applied to Physics-Informed Neural Networks, the method exhibits faster convergence and robustness to variations of the network size.
arXiv Detail & Related papers (2024-10-03T06:30:27Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Asymptotic Analysis of Deep Residual Networks [6.308539010172309]
We investigate the properties of deep Residual networks (ResNets) as the number of layers increases.
We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature.
We study the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a convergence equation (SDE) or neither of these.
arXiv Detail & Related papers (2022-12-15T23:55:01Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - On the eigenvector bias of Fourier feature networks: From regression to
solving multi-scale PDEs with physics-informed neural networks [0.0]
We show that neural networks (PINNs) struggle in cases where the target functions to be approximated exhibit high-frequency or multi-scale features.
We construct novel architectures that employ multi-scale random observational features and justify how such coordinate embedding layers can lead to robust and accurate PINN models.
arXiv Detail & Related papers (2020-12-18T04:19:30Z) - Delay Differential Neural Networks [0.2538209532048866]
We propose a novel model, delay differential neural networks (DDNN), inspired by delay differential equations (DDEs)
For training DDNNs, we provide a memory-efficient adjoint method for computing gradients and back-propagate through the network.
Experiments conducted on synthetic and real-world image classification datasets such as Cifar10 and Cifar100 show the effectiveness of the proposed models.
arXiv Detail & Related papers (2020-12-12T12:20:54Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - Mean-Field and Kinetic Descriptions of Neural Differential Equations [0.0]
In this work we focus on a particular class of neural networks, i.e. the residual neural networks.
We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias.
A modification of the microscopic dynamics, inspired by residual neural networks, leads to a Fokker-Planck formulation of the network.
arXiv Detail & Related papers (2020-01-07T13:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.