From high-dimensional & mean-field dynamics to dimensionless ODEs: A
unifying approach to SGD in two-layers networks
- URL: http://arxiv.org/abs/2302.05882v1
- Date: Sun, 12 Feb 2023 09:50:52 GMT
- Title: From high-dimensional & mean-field dynamics to dimensionless ODEs: A
unifying approach to SGD in two-layers networks
- Authors: Luca Arnaboldi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro
- Abstract summary: This manuscript investigates the one-pass gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels.
We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk.
- Score: 26.65398696336828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This manuscript investigates the one-pass stochastic gradient descent (SGD)
dynamics of a two-layer neural network trained on Gaussian data and labels
generated by a similar, though not necessarily identical, target function. We
rigorously analyse the limiting dynamics via a deterministic and
low-dimensional description in terms of the sufficient statistics for the
population risk. Our unifying analysis bridges different regimes of interest,
such as the classical gradient-flow regime of vanishing learning rate, the
high-dimensional regime of large input dimension, and the overparameterised
"mean-field" regime of large network width, covering as well the intermediate
regimes where the limiting dynamics is determined by the interplay between
these behaviours. In particular, in the high-dimensional limit, the
infinite-width dynamics is found to remain close to a low-dimensional subspace
spanned by the target principal directions. Our results therefore provide a
unifying picture of the limiting SGD dynamics with synthetic data.
Related papers
- Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Non-Separable Multi-Dimensional Network Flows for Visual Computing [62.50191141358778]
We propose a novel formalism for non-separable multi-dimensional network flows.
Since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions.
As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.
arXiv Detail & Related papers (2023-05-15T13:21:44Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Asymptotic Analysis of Deep Residual Networks [6.308539010172309]
We investigate the properties of deep Residual networks (ResNets) as the number of layers increases.
We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature.
We study the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a convergence equation (SDE) or neither of these.
arXiv Detail & Related papers (2022-12-15T23:55:01Z) - A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer
Neural Networks [49.870593940818715]
We study the infinite-width limit of a type of three-layer NN model whose first layer is random and fixed.
Our theory accommodates different scaling choices of the model, resulting in two regimes of the MF limit that demonstrate distinctive behaviors.
arXiv Detail & Related papers (2022-10-28T17:26:27Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - Phase diagram of Stochastic Gradient Descent in high-dimensional
two-layer neural networks [22.823904789355495]
We investigate the connection between the mean-fieldhydrodynamic regime and the seminal approach of Saad & Solla.
Our work builds on a deterministic description of rates in high-dimensionals from statistical physics.
arXiv Detail & Related papers (2022-02-01T09:45:07Z) - Data-Driven Reduced-Order Modeling of Spatiotemporal Chaos with Neural
Ordinary Differential Equations [0.0]
We present a data-driven reduced order modeling method that capitalizes on the chaotic dynamics of partial differential equations.
We find that dimension reduction improves performance relative to predictions in the ambient space.
With the low-dimensional model, we find excellent short- and long-time statistical recreation of the true dynamics for widely spaced data.
arXiv Detail & Related papers (2021-08-31T20:00:33Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.