Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
- URL: http://arxiv.org/abs/2504.13110v1
- Date: Thu, 17 Apr 2025 17:24:38 GMT
- Title: Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
- Authors: Margalit Glasgow, Denny Wu, Joan Bruna,
- Abstract summary: We study the approximation gap between the dynamics of a-width neural network and its infinite-width counterpart.<n>We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics.
- Score: 39.09304480125516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound this approximation gap through a differential equation governed by the mean-field dynamics. A key factor influencing the growth of this ODE is the local Hessian of each particle, defined as the derivative of the particle's velocity in the mean-field dynamics with respect to its position. We apply our results to the canonical feature learning problem of estimating a well-specified single-index model; we permit the information exponent to be arbitrarily large, leading to convergence times that grow polynomially in the ambient dimension $d$. We show that, due to a certain ``self-concordance'' property in these problems -- where the local Hessian of a particle is bounded by a constant times the particle's velocity -- polynomially many neurons are sufficient to closely approximate the mean-field dynamics throughout training.
Related papers
- Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble [36.19164064733151]
Mean-field Langevin dynamics (MFLD) is an optimization method derived by taking the mean-field limit of noisy gradient descent for two-layer neural networks.
Recent work shows that the approximation error due to finite particles remains uniform in time and diminishes as the number of particles increases.
In this paper, we establish an improved PoC result for MFLD, which removes the exponential dependence on the regularization coefficient from the particle approximation term.
arXiv Detail & Related papers (2025-02-09T05:58:46Z) - Dynamic metastability in the self-attention model [22.689695473655906]
We consider the self-attention model - an interacting particle system on the unit sphere - which serves as a toy model for Transformers.
We prove the appearance of dynamic metastability conjectured in [GLPR23].
We show that under an appropriate time-rescaling, the energy reaches its global maximum in finite time and has a staircase profile.
arXiv Detail & Related papers (2024-10-09T12:50:50Z) - Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics [21.55547541297847]
We study the problem of learning multi-index models in high-dimensions using a two-layer neural network trained with the mean-field Langevin algorithm.
Under mild distributional assumptions, we characterize the effective dimension $d_mathrmeff$ that controls both sample and computational complexity.
arXiv Detail & Related papers (2024-08-14T02:13:35Z) - Improved Particle Approximation Error for Mean Field Neural Networks [9.817855108627452]
Mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional defined over the space of probability distributions.
Recent works have demonstrated the uniform-in-time propagation of chaos for MFLD.
We improve the dependence on logarithmic Sobolev inequality (LSI) constants in their particle approximation errors.
arXiv Detail & Related papers (2024-05-24T17:59:06Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Decimation technique for open quantum systems: a case study with
driven-dissipative bosonic chains [62.997667081978825]
Unavoidable coupling of quantum systems to external degrees of freedom leads to dissipative (non-unitary) dynamics.
We introduce a method to deal with these systems based on the calculation of (dissipative) lattice Green's function.
We illustrate the power of this method with several examples of driven-dissipative bosonic chains of increasing complexity.
arXiv Detail & Related papers (2022-02-15T19:00:09Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations,
and Anomalous Diffusion [29.489737359897312]
We study the limiting dynamics of deep neural networks trained with gradient descent (SGD)
We show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity and probability currents, which cause oscillations in phase space.
arXiv Detail & Related papers (2021-07-19T20:18:57Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.