High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory
- URL: http://arxiv.org/abs/2602.06320v1
- Date: Fri, 06 Feb 2026 02:37:10 GMT
- Title: High-Dimensional Limit of Stochastic Gradient Flow via Dynamical Mean-Field Theory
- Authors: Sota Nishiyama, Masaaki Imaizumi,
- Abstract summary: Modern machine learning models are typically trained via multi-pass gradient descent (SGD) with small batch sizes.<n>We analyze the high-dimensional dynamics of a differential equation called a emphstochastic gradient flow (SGF)<n>We show that the resulting DMFT equations recover several existing high-dimensional descriptions of SGD dynamics as special cases.
- Score: 6.2000582635449994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern machine learning models are typically trained via multi-pass stochastic gradient descent (SGD) with small batch sizes, and understanding their dynamics in high dimensions is of great interest. However, an analytical framework for describing the high-dimensional asymptotic behavior of multi-pass SGD with small batch sizes for nonlinear models is currently missing. In this study, we address this gap by analyzing the high-dimensional dynamics of a stochastic differential equation called a \emph{stochastic gradient flow} (SGF), which approximates multi-pass SGD in this regime. In the limit where the number of data samples $n$ and the dimension $d$ grow proportionally, we derive a closed system of low-dimensional and continuous-time equations and prove that it characterizes the asymptotic distribution of the SGF parameters. Our theory is based on the dynamical mean-field theory (DMFT) and is applicable to a wide range of models encompassing generalized linear models and two-layer neural networks. We further show that the resulting DMFT equations recover several existing high-dimensional descriptions of SGD dynamics as special cases, thereby providing a unifying perspective on prior frameworks such as online SGD and high-dimensional linear regression. Our proof builds on the existing DMFT technique for gradient flow and extends it to handle the stochasticity in SGF using tools from stochastic calculus.
Related papers
- Riemannian Langevin Dynamics: Strong Convergence of Geometric Euler-Maruyama Scheme [51.56484100374058]
Low-dimensional structure in real-world data plays an important role in the success of generative models.<n>We prove convergence theory of numerical schemes for manifold-valued differential equations.
arXiv Detail & Related papers (2026-03-04T01:29:35Z) - High-dimensional learning dynamics of multi-pass Stochastic Gradient Descent in multi-index models [2.2129910930772]
We study the learning dynamics of a multi-pass, mini-batch Gradient Descent (SGD) procedure for empirical risk minimization.<n>In an limit regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $asymp n where $in [0,1)$, we provide anally exact characterization of the coordinate-wise dynamics of SGD.
arXiv Detail & Related papers (2026-01-28T22:28:12Z) - Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z) - Exact Dynamics of Multi-class Stochastic Gradient Descent [4.1538344141902135]
We develop a framework for analyzing the training and learning rate dynamics on a variety of high-dimensional optimization problems trained using one-pass gradient descent (SGD)<n>We give exact expressions for a large class of functions of the limiting dynamics, including the risk and the overlap with the true signal, in terms of a deterministic solution to a system of ODEs.
arXiv Detail & Related papers (2025-10-15T20:31:49Z) - Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z) - Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on
GLMs and multi-index models [10.781866671930857]
We analyze the dynamics of streaming gradient descent (SGD) in the high-dimensional limit.
We demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations.
In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient.
arXiv Detail & Related papers (2023-08-17T13:33:02Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
Stability [69.01076284478151]
In machine learning optimization, gradient descent (GD) often operates at the edge of stability (EoS)
This paper studies the convergence and implicit bias of constant-stepsize GD for logistic regression on linearly separable data in the EoS regime.
arXiv Detail & Related papers (2023-05-19T16:24:47Z) - From high-dimensional & mean-field dynamics to dimensionless ODEs: A
unifying approach to SGD in two-layers networks [26.65398696336828]
This manuscript investigates the one-pass gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels.
We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk.
arXiv Detail & Related papers (2023-02-12T09:50:52Z) - NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements.
The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively.
We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.