A Note on the Global Convergence of Multilayer Neural Networks in the
Mean Field Regime
- URL: http://arxiv.org/abs/2006.09355v1
- Date: Tue, 16 Jun 2020 17:50:34 GMT
- Title: A Note on the Global Convergence of Multilayer Neural Networks in the
Mean Field Regime
- Authors: Huy Tuan Pham, Phan-Minh Nguyen
- Abstract summary: We introduce a rigorous framework to describe the mean field limit of gradient-based learning dynamics of multilayer neural networks.
We prove a global convergence guarantee for multilayer networks of any depths.
- Score: 9.89901717499058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a recent work, we introduced a rigorous framework to describe the mean
field limit of the gradient-based learning dynamics of multilayer neural
networks, based on the idea of a neuronal embedding. There we also proved a
global convergence guarantee for three-layer (as well as two-layer) networks
using this framework.
In this companion note, we point out that the insights in our previous work
can be readily extended to prove a global convergence guarantee for multilayer
networks of any depths. Unlike our previous three-layer global convergence
guarantee that assumes i.i.d. initializations, our present result applies to a
type of correlated initialization. This initialization allows to, at any finite
training time, propagate a certain universal approximation property through the
depth of the neural network. To achieve this effect, we introduce a
bidirectional diversity condition.
Related papers
- Compositional Curvature Bounds for Deep Neural Networks [7.373617024876726]
A key challenge that threatens the widespread use of neural networks in safety-critical applications is their vulnerability to adversarial attacks.
We study the second-order behavior of continuously differentiable deep neural networks, focusing on robustness against adversarial perturbations.
We introduce a novel algorithm to analytically compute provable upper bounds on the second derivative of neural networks.
arXiv Detail & Related papers (2024-06-07T17:50:15Z) - WLD-Reg: A Data-dependent Within-layer Diversity Regularizer [98.78384185493624]
Neural networks are composed of multiple layers arranged in a hierarchical structure jointly trained with a gradient-based optimization.
We propose to complement this traditional 'between-layer' feedback with additional 'within-layer' feedback to encourage the diversity of the activations within the same layer.
We present an extensive empirical study confirming that the proposed approach enhances the performance of several state-of-the-art neural network models in multiple tasks.
arXiv Detail & Related papers (2023-01-03T20:57:22Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - On Feature Learning in Neural Networks with Global Convergence
Guarantees [49.870593940818715]
We study the optimization of wide neural networks (NNs) via gradient flow (GF)
We show that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF.
We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.
arXiv Detail & Related papers (2022-04-22T15:56:43Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - A global universality of two-layer neural networks with ReLU activations [5.51579574006659]
We investigate a universality of neural networks, which concerns a density of the set of two-layer neural networks in a function spaces.
We consider a global convergence by introducing a norm suitably, so that our results will be uniform over any compact set.
arXiv Detail & Related papers (2020-11-20T05:39:10Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
Optimization Via Overparameterization From Depth [19.866928507243617]
Training deep neural networks with gradient descent (SGD) can often achieve zero training loss on real-world landscapes.
We propose a new limit of infinity deep residual networks, which enjoys a good training in the sense that everyr is global.
arXiv Detail & Related papers (2020-03-11T20:14:47Z) - A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.