A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks
- URL: http://arxiv.org/abs/2001.11443v2
- Date: Tue, 4 May 2021 17:44:02 GMT
- Title: A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks
- Authors: Phan-Minh Nguyen, Huy Tuan Pham
- Abstract summary: We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
- Score: 9.89901717499058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a mathematically rigorous framework for multilayer neural networks
in the mean field regime. As the network's widths increase, the network's
learning trajectory is shown to be well captured by a meaningful and
dynamically nonlinear limit (the \textit{mean field} limit), which is
characterized by a system of ODEs. Our framework applies to a broad range of
network architectures, learning dynamics and network initializations. Central
to the framework is the new idea of a \textit{neuronal embedding}, which
comprises of a non-evolving probability space that allows to embed neural
networks of arbitrary widths.
Using our framework, we prove several properties of large-width multilayer
neural networks. Firstly we show that independent and identically distributed
initializations cause strong degeneracy effects on the network's learning
trajectory when the network's depth is at least four. Secondly we obtain
several global convergence guarantees for feedforward multilayer networks under
a number of different setups. These include two-layer and three-layer networks
with independent and identically distributed initializations, and multilayer
networks of arbitrary depths with a special type of correlated initializations
that is motivated by the new concept of \textit{bidirectional diversity}.
Unlike previous works that rely on convexity, our results admit non-convex
losses and hinge on a certain universal approximation property, which is a
distinctive feature of infinite-width neural networks and is shown to hold
throughout the training process. Aside from being the first known results for
global convergence of multilayer networks in the mean field regime, they
demonstrate flexibility of our framework and incorporate several new ideas and
insights that depart from the conventional convex optimization wisdom.
Related papers
- Local Kernel Renormalization as a mechanism for feature learning in
overparametrized Convolutional Neural Networks [0.0]
Empirical evidence shows that fully-connected neural networks in the infinite-width limit eventually outperform their finite-width counterparts.
State-of-the-art architectures with convolutional layers achieve optimal performances in the finite-width regime.
We show that the generalization performance of a finite-width FC network can be obtained by an infinite-width network, with a suitable choice of the Gaussian priors.
arXiv Detail & Related papers (2023-07-21T17:22:04Z) - Feature-Learning Networks Are Consistent Across Widths At Realistic
Scales [72.27228085606147]
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets.
Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in their point-wise test predictions throughout training.
We observe, however, that ensembles of narrower networks perform worse than a single wide network.
arXiv Detail & Related papers (2023-05-28T17:09:32Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - A Note on the Global Convergence of Multilayer Neural Networks in the
Mean Field Regime [9.89901717499058]
We introduce a rigorous framework to describe the mean field limit of gradient-based learning dynamics of multilayer neural networks.
We prove a global convergence guarantee for multilayer networks of any depths.
arXiv Detail & Related papers (2020-06-16T17:50:34Z) - Dynamic Hierarchical Mimicking Towards Consistent Optimization
Objectives [73.15276998621582]
We propose a generic feature learning mechanism to advance CNN training with enhanced generalization ability.
Partially inspired by DSN, we fork delicately designed side branches from the intermediate layers of a given neural network.
Experiments on both category and instance recognition tasks demonstrate the substantial improvements of our proposed method.
arXiv Detail & Related papers (2020-03-24T09:56:13Z) - Quasi-Equivalence of Width and Depth of Neural Networks [10.365556153676538]
We investigate if the design of artificial neural networks should have a directional preference.
Inspired by the De Morgan law, we establish a quasi-equivalence between the width and depth of ReLU networks.
Based on our findings, a deep network has a wide equivalent, subject to an arbitrarily small error.
arXiv Detail & Related papers (2020-02-06T21:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.