Global Convergence of Three-layer Neural Networks in the Mean Field
Regime
- URL: http://arxiv.org/abs/2105.05228v1
- Date: Tue, 11 May 2021 17:45:42 GMT
- Title: Global Convergence of Three-layer Neural Networks in the Mean Field
Regime
- Authors: Huy Tuan Pham, Phan-Minh Nguyen
- Abstract summary: In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
- Score: 3.553493344868413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the mean field regime, neural networks are appropriately scaled so that as
the width tends to infinity, the learning dynamics tends to a nonlinear and
nontrivial dynamical limit, known as the mean field limit. This lends a way to
study large-width neural networks via analyzing the mean field limit. Recent
works have successfully applied such analysis to two-layer networks and
provided global convergence guarantees. The extension to multilayer ones
however has been a highly challenging puzzle, and little is known about the
optimization efficiency in the mean field regime when there are more than two
layers.
In this work, we prove a global convergence result for unregularized
feedforward three-layer networks in the mean field regime. We first develop a
rigorous framework to establish the mean field limit of three-layer networks
under stochastic gradient descent training. To that end, we propose the idea of
a \textit{neuronal embedding}, which comprises of a fixed probability space
that encapsulates neural networks of arbitrary sizes. The identified mean field
limit is then used to prove a global convergence guarantee under suitable
regularity and convergence mode assumptions, which -- unlike previous works on
two-layer networks -- does not rely critically on convexity. Underlying the
result is a universal approximation property, natural of neural networks, which
importantly is shown to hold at \textit{any} finite training time (not
necessarily at convergence) via an algebraic topology argument.
Related papers
- Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Mean-field analysis for heavy ball methods: Dropout-stability,
connectivity, and global convergence [17.63517562327928]
This paper focuses on neural networks with two and three layers and provides a rigorous understanding of the properties of the solutions found by SHB.
We show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network.
arXiv Detail & Related papers (2022-10-13T08:08:25Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Limiting fluctuation and trajectorial stability of multilayer neural
networks with mean field training [3.553493344868413]
We study the fluctuation in the case of multilayer networks at any network depth.
We demonstrate through the framework the complex interaction among neurons in this second-order MF limit.
A limit theorem is proven to relate this limit to the fluctuation of large-width networks.
arXiv Detail & Related papers (2021-10-29T17:58:09Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - A Note on the Global Convergence of Multilayer Neural Networks in the
Mean Field Regime [9.89901717499058]
We introduce a rigorous framework to describe the mean field limit of gradient-based learning dynamics of multilayer neural networks.
We prove a global convergence guarantee for multilayer networks of any depths.
arXiv Detail & Related papers (2020-06-16T17:50:34Z) - A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.