A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks
- URL: http://arxiv.org/abs/2210.16286v1
- Date: Fri, 28 Oct 2022 17:26:27 GMT
- Title: A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks
- Authors: Zhengdao Chen, Eric Vanden-Eijnden, Joan Bruna
- Abstract summary: We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
- Score: 49.870593940818715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To understand the training dynamics of neural networks (NNs), prior studies
have considered the infinite-width mean-field (MF) limit of two-layer NN,
establishing theoretical guarantees of its convergence under gradient flow
training as well as its approximation and generalization capabilities. In this
work, we study the infinite-width limit of a type of three-layer NN model whose
first layer is random and fixed. To define the limiting model rigorously, we
generalize the MF theory of two-layer NNs by treating the neurons as belonging
to functional spaces. Then, by writing the MF training dynamics as a kernel
gradient flow with a time-varying kernel that remains positive-definite, we
prove that its training loss in $L_2$ regression decays to zero at a linear
rate. Furthermore, we define function spaces that include the solutions
obtainable through the MF training dynamics and prove Rademacher complexity
bounds for these spaces. Our theory accommodates different scaling choices of
the model, resulting in two regimes of the MF limit that demonstrate
distinctive behaviors while both exhibiting feature learning.
Related papers
- Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Limiting fluctuation and trajectorial stability of multilayer neural
networks with mean field training [3.553493344868413]
We study the fluctuation in the case of multilayer networks at any network depth.
We demonstrate through the framework the complex interaction among neurons in this second-order MF limit.
A limit theorem is proven to relate this limit to the fluctuation of large-width networks.
arXiv Detail & Related papers (2021-10-29T17:58:09Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - Phase diagram for two-layer ReLU neural networks at infinite-width limit [6.380166265263755]
We draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit.
We identify three regimes in the phase diagram, i.e., linear regime, critical regime and condensed regime.
In the linear regime, NN training dynamics is approximately linear similar to a random feature model with an exponential loss decay.
In the condensed regime, we demonstrate through experiments that active neurons are condensed at several discrete orientations.
arXiv Detail & Related papers (2020-07-15T06:04:35Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.