Mean-field analysis for heavy ball methods: Dropout-stability,
connectivity, and global convergence
- URL: http://arxiv.org/abs/2210.06819v1
- Date: Thu, 13 Oct 2022 08:08:25 GMT
- Title: Mean-field analysis for heavy ball methods: Dropout-stability,
connectivity, and global convergence
- Authors: Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli
- Abstract summary: This paper focuses on neural networks with two and three layers and provides a rigorous understanding of the properties of the solutions found by SHB.
We show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network.
- Score: 17.63517562327928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The stochastic heavy ball method (SHB), also known as stochastic gradient
descent (SGD) with Polyak's momentum, is widely used in training neural
networks. However, despite the remarkable success of such algorithm in
practice, its theoretical characterization remains limited. In this paper, we
focus on neural networks with two and three layers and provide a rigorous
understanding of the properties of the solutions found by SHB: \emph{(i)}
stability after dropping out part of the neurons, \emph{(ii)} connectivity
along a low-loss path, and \emph{(iii)} convergence to the global optimum. To
achieve this goal, we take a mean-field view and relate the SHB dynamics to a
certain partial differential equation in the limit of large network widths.
This mean-field perspective has inspired a recent line of work focusing on SGD
while, in contrast, our paper considers an algorithm with momentum. More
specifically, after proving existence and uniqueness of the limit differential
equations, we show convergence to the global optimum and give a quantitative
bound between the mean-field limit and the SHB dynamics of a finite-width
network. Armed with this last bound, we are able to establish the
dropout-stability and connectivity of SHB solutions.
Related papers
- Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Limiting fluctuation and trajectorial stability of multilayer neural
networks with mean field training [3.553493344868413]
We study the fluctuation in the case of multilayer networks at any network depth.
We demonstrate through the framework the complex interaction among neurons in this second-order MF limit.
A limit theorem is proven to relate this limit to the fluctuation of large-width networks.
arXiv Detail & Related papers (2021-10-29T17:58:09Z) - Global Convergence of Three-layer Neural Networks in the Mean Field
Regime [3.553493344868413]
In the mean field regime, neural networks are appropriately scaled so that as the width tends to infinity, the learning dynamics tends to a nonlinear and nontrivial dynamical limit, known as the mean field limit.
Recent works have successfully applied such analysis to two-layer networks and provided global convergence guarantees.
We prove a global convergence result for unregularized feedforward three-layer networks in the mean field regime.
arXiv Detail & Related papers (2021-05-11T17:45:42Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.