Limiting fluctuation and trajectorial stability of multilayer neural
networks with mean field training
- URL: http://arxiv.org/abs/2110.15954v1
- Date: Fri, 29 Oct 2021 17:58:09 GMT
- Title: Limiting fluctuation and trajectorial stability of multilayer neural
networks with mean field training
- Authors: Huy Tuan Pham, Phan-Minh Nguyen
- Abstract summary: We study the fluctuation in the case of multilayer networks at any network depth.
We demonstrate through the framework the complex interaction among neurons in this second-order MF limit.
A limit theorem is proven to relate this limit to the fluctuation of large-width networks.
- Score: 3.553493344868413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mean field (MF) theory of multilayer neural networks centers around a
particular infinite-width scaling, where the learning dynamics is closely
tracked by the MF limit. A random fluctuation around this infinite-width limit
is expected from a large-width expansion to the next order. This fluctuation
has been studied only in shallow networks, where previous works employ heavily
technical notions or additional formulation ideas amenable only to that case.
Treatment of the multilayer case has been missing, with the chief difficulty in
finding a formulation that captures the stochastic dependency across not only
time but also depth.
In this work, we initiate the study of the fluctuation in the case of
multilayer networks, at any network depth. Leveraging on the neuronal embedding
framework recently introduced by Nguyen and Pham, we systematically derive a
system of dynamical equations, called the second-order MF limit, that captures
the limiting fluctuation distribution. We demonstrate through the framework the
complex interaction among neurons in this second-order MF limit, the
stochasticity with cross-layer dependency and the nonlinear time evolution
inherent in the limiting fluctuation. A limit theorem is proven to relate
quantitatively this limit to the fluctuation of large-width networks.
We apply the result to show a stability property of gradient descent MF
training: in the large-width regime, along the training trajectory, it
progressively biases towards a solution with "minimal fluctuation" (in fact,
vanishing fluctuation) in the learned output function, even after the network
has been initialized at or has converged (sufficiently fast) to a global
optimum. This extends a similar phenomenon previously shown only for shallow
networks with a squared loss in the ERM setting, to multilayer networks with a
loss function that is not necessarily convex in a more general setting.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Mean-field analysis for heavy ball methods: Dropout-stability,
connectivity, and global convergence [17.63517562327928]
This paper focuses on neural networks with two and three layers and provides a rigorous understanding of the properties of the solutions found by SHB.
We show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network.
arXiv Detail & Related papers (2022-10-13T08:08:25Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Training Integrable Parameterizations of Deep Neural Networks in the
Infinite-Width Limit [0.0]
Large-width dynamics has emerged as a fruitful viewpoint and led to practical insights on real-world deep networks.
For two-layer neural networks, it has been understood that the nature of the trained model radically changes depending on the scale of the initial random weights.
We propose various methods to avoid this trivial behavior and analyze in detail the resulting dynamics.
arXiv Detail & Related papers (2021-10-29T07:53:35Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.