Doubly infinite residual neural networks: a diffusion process approach
- URL: http://arxiv.org/abs/2007.03253v2
- Date: Sun, 19 Sep 2021 03:19:59 GMT
- Title: Doubly infinite residual neural networks: a diffusion process approach
- Authors: Stefano Peluchetti and Stefano Favaro
- Abstract summary: We show that deep ResNets do not suffer from undesirable forward-propagation properties.
We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d.
Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.
- Score: 8.642603456626393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern neural networks (NN) featuring a large number of layers (depth) and
units per layer (width) have achieved a remarkable performance across many
domains. While there exists a vast literature on the interplay between
infinitely wide NNs and Gaussian processes, a little is known about analogous
interplays with respect to infinitely deep NNs. NNs with independent and
identically distributed (i.i.d.) initializations exhibit undesirable forward
and backward propagation properties as the number of layers increases. To
overcome these drawbacks, Peluchetti and Favaro (2020) considered
fully-connected residual networks (ResNets) with network's parameters
initialized by means of distributions that shrink as the number of layers
increases, thus establishing an interplay between infinitely deep ResNets and
solutions to stochastic differential equations, i.e. diffusion processes, and
showing that infinitely deep ResNets does not suffer from undesirable
forward-propagation properties. In this paper, we review the results of
Peluchetti and Favaro (2020), extending them to convolutional ResNets, and we
establish analogous backward-propagation results, which directly relate to the
problem of training fully-connected deep ResNets. Then, we investigate the more
general setting of doubly infinite NNs, where both network's width and
network's depth grow unboundedly. We focus on doubly infinite fully-connected
ResNets, for which we consider i.i.d. initializations. Under this setting, we
show that the dynamics of quantities of interest converge, at initialization,
to deterministic limits. This allow us to provide analytical expressions for
inference, both in the case of weakly trained and fully trained ResNets. Our
results highlight a limited expressive power of doubly infinite ResNets when
the unscaled network's parameters are i.i.d. and the residual blocks are
shallow.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Convergence Analysis of Deep Residual Networks [3.274290296343038]
Deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision.
We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks.
arXiv Detail & Related papers (2022-05-13T11:53:09Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Deep Learning without Shortcuts: Shaping the Kernel with Tailored
Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs.
We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z) - Adversarial Examples in Multi-Layer Random ReLU Networks [39.797621513256026]
adversarial examples arise in ReLU networks with independent gaussian parameters.
Bottleneck layers in the network play a key role: the minimal width up to some point determines scales and sensitivities of mappings computed up to that point.
arXiv Detail & Related papers (2021-06-23T18:16:34Z) - The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
Limit at Initialization [18.613475245655806]
We study ReLU ResNets in the infinite-depth-and-width limit, where both depth and width tend to infinity as their ratio, $d/n$, remains constant.
Using Monte Carlo simulations, we demonstrate that even basic properties of standard ResNet architectures are poorly captured by the Gaussian limit.
arXiv Detail & Related papers (2021-06-07T23:47:37Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z) - On Random Kernels of Residual Architectures [93.94469470368988]
We derive finite width and depth corrections for the Neural Tangent Kernel (NTK) of ResNets and DenseNets.
Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity.
In DenseNets, however, convergence of the NTK to its limit as the width tends to infinity is guaranteed.
arXiv Detail & Related papers (2020-01-28T16:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.