Related papers: Effective continuous equations for adaptive SGD: a stochastic analysis view

Effective continuous equations for adaptive SGD: a stochastic analysis view

URL: http://arxiv.org/abs/2509.21614v1
Date: Thu, 25 Sep 2025 21:31:20 GMT
Title: Effective continuous equations for adaptive SGD: a stochastic analysis view
Authors: Luca Callisti, Marco Romito, Francesco Triggiano,
Abstract summary: We present a theoretical analysis of some popular adaptive Gradient Descent (SGD) methods in the small learning rate regime.<n>Our key contribution is that sampling-induced noise in SGD manifests as independent Brownian motions driving the parameter gradient second momentum evolutions.
Score: 1.2744523252873352
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We present a theoretical analysis of some popular adaptive Stochastic Gradient Descent (SGD) methods in the small learning rate regime. Using the stochastic modified equations framework introduced by Li et al., we derive effective continuous stochastic dynamics for these methods. Our key contribution is that sampling-induced noise in SGD manifests in the limit as independent Brownian motions driving the parameter and gradient second momentum evolutions. Furthermore, extending the approach of Malladi et al., we investigate scaling rules between the learning rate and key hyperparameters in adaptive methods, characterising all non-trivial limiting dynamics.

Related papers

Convergence of Stochastic Gradient Langevin Dynamics in the Lazy Training Regime [4.297070083645049]
Continuoustime models provide insights into the training dynamics of optimization algorithms in deep learning.<n>We establish a non-asymptotic convergence analysis of gradient Langevin dynamics (SGLD)<n>We show that, under regularity conditions on the Hessian of the loss function, SGLD with multiplicative and state-dependent noise yields a non-degenerate kernel throughout the training process with high probability.
arXiv Detail & Related papers (2025-10-24T08:28:53Z)
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning. These problems are often formalized as Bi-Level optimizations (BLO) We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Variational Inference for SDEs Driven by Fractional Noise [16.434973057669676]
We present a novel variational framework for performing inference in (neural) differential equations (SDEs) driven by Markov-approximate fractional Brownian motion (fBM) We propose the use of neural networks to learn the drift, diffusion and control terms within our variational posterior leading to the variational training of neural-SDEs.
arXiv Detail & Related papers (2023-10-19T17:59:21Z)
Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent [1.2031796234206138]
We propose new limiting dynamics for gradient descent in the small learning rate regime called modified flows. These SDEs are driven by a cylindrical Brownian motion and improve the so-called modified equations by having regular diffusion coefficients and by matching the multi-point statistics.
arXiv Detail & Related papers (2023-02-14T15:33:59Z)
NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer [45.47667026025716]
We propose a novel, robust and accelerated iteration that relies on two key elements. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively. We show that NAG-arity is competitive with state-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models.
arXiv Detail & Related papers (2022-09-29T16:54:53Z)
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent [3.0079490585515343]
gradient descent (SGD) is relatively well understood in the vanishing learning rate regime. We propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime.
arXiv Detail & Related papers (2020-12-07T12:31:43Z)
Training Generative Adversarial Networks by Solving Ordinary Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training. From this perspective, we hypothesise that instabilities in training GANs arise from the integration error. We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z)
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces [53.47210316424326]
KeRNS is an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes. We prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time.
arXiv Detail & Related papers (2020-07-09T21:37:13Z)
Responsive Safety in Reinforcement Learning by PID Lagrangian Methods [74.49173841304474]
Lagrangian methods exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior. We propose a novel Lagrange multiplier update method that utilizes derivatives of the constraint function. We apply our PID Lagrangian methods in deep RL, setting a new state of the art in Safety Gym, a safe RL benchmark.
arXiv Detail & Related papers (2020-07-08T08:43:14Z)
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification [25.898873960635534]
We analyze in a closed learning dynamics of gradient descent (SGD) for a single-layer neural network classifying a high-dimensional landscape. We define a prototype process for which can be extended to a continuous-dimensional gradient flow. In the full-batch limit, we recover the standard gradient flow.
arXiv Detail & Related papers (2020-06-10T22:49:41Z)
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients. We prove new complexity that outperform state-of-the-art results in this case. We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
Stochastic Modified Equations for Continuous Limit of Stochastic ADMM [13.694172299830315]
We put different variants of ADMM into a unified form, which includes standard, linearized and gradient-based ADMM with relaxation, and study their dynamics via a continuous-time model approach. We show that the dynamics of ADMM is approximated by a class of differential equations with small noise parameters in the sense of weak approximation.
arXiv Detail & Related papers (2020-03-07T08:01:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.