Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
- URL: http://arxiv.org/abs/2511.02258v1
- Date: Tue, 04 Nov 2025 04:52:19 GMT
- Title: Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks
- Authors: Parsa Rangriz,
- Abstract summary: We study the high-dimensional scaling limits of online gradient descent (SGD) for single-layer networks.<n>This paper builds on the work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the flow of the population loss.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD) for single-layer networks. Building on the seminal work of Saad and Solla, which analyzed the deterministic (ballistic) scaling limits of SGD corresponding to the gradient flow of the population loss, we focus on the critical scaling regime of the step size. Below this critical scale, the effective dynamics are governed by ballistic (ODE) limits, but at the critical scale, new correction term appears that changes the phase diagram. In this regime, near the fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduces to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrates the limitations of deterministic scaling limit in capturing the stochastic fluctuations of high-dimensional learning dynamics.
Related papers
- Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations [53.348574336527854]
We investigate the neural network map as a function on the space of weights and biases.<n>We prove, for the first time, the non-singularity of the gradient descent (GD) map on the loss landscape of realistic neural network architectures.
arXiv Detail & Related papers (2025-10-28T14:34:33Z) - High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator [7.837881800517111]
Quantized neural network training optimize a discrete, non-differentiable objective.<n>The straight-through estimator (STE) enables backpropagation through surrogate gradients.<n>We theoretically show that in the high-dimensional limit, STE dynamics converge to an ordinary deterministic differential equation.
arXiv Detail & Related papers (2025-10-12T16:43:46Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - On the Convergence of Gradient Descent for Large Learning Rates [55.33626480243135]
We show that convergence is impossible when a fixed step size is used.<n>We provide a proof of this in the case of linear neural networks with a squared loss.<n>We also prove the impossibility of convergence for more general losses without requiring strong assumptions such as Lipschitz continuity for the gradient.
arXiv Detail & Related papers (2024-02-20T16:01:42Z) - Convergence of mean-field Langevin dynamics: Time and space
discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift.
Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.
We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - From high-dimensional & mean-field dynamics to dimensionless ODEs: A
unifying approach to SGD in two-layers networks [26.65398696336828]
This manuscript investigates the one-pass gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels.
We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk.
arXiv Detail & Related papers (2023-02-12T09:50:52Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training.
If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z) - Dynamical mean-field theory for stochastic gradient descent in Gaussian
mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape.
We define a prototype process for which can be extended to a continuous-dimensional gradient flow.
In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.