Related papers: Modified Equations for Stochastic Optimization

Modified Equations for Stochastic Optimization

URL: http://arxiv.org/abs/2511.20322v1
Date: Tue, 25 Nov 2025 13:54:31 GMT
Title: Modified Equations for Stochastic Optimization
Authors: Stefan Perko,
Abstract summary: We study time-inhomogeneous SDEs driven by Brownian motion.<n>In Ch. 6 we motivate and define the notion of an epoched Brownian motion (EBM)<n>In Ch. 7 we study scaling limits of families of random walks (RW) that share the same increments up to a random permutation.
Score: 1.7767466724342065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this thesis, we extend the recently introduced theory of stochastic modified equations (SMEs) for stochastic gradient optimization algorithms. In Ch. 3 we study time-inhomogeneous SDEs driven by Brownian motion. For certain SDEs we prove a 1st and 2nd-order weak approximation properties, and we compute their linear error terms explicitly, under certain regularity conditions. In Ch. 4 we instantiate our results for SGD, working out the example of linear regression explicitly. We use this example to compare the linear error terms of gradient flow and two commonly used 1st-order SMEs for SGD in Ch. 5. In the second part of the thesis we introduce and study a novel diffusion approximation for SGD without replacement (SGDo) in the finite-data setting. In Ch. 6 we motivate and define the notion of an epoched Brownian motion (EBM). We argue that Young differential equations (YDEs) driven by EBMs serve as continuous-time models for SGDo for any shuffling scheme whose induced permutations converge to a det. permuton. Further, we prove a.s. convergence for these YDEs in the strongly convex setting. Moreover, we compute an upper asymptotic bound on the convergence rate which is as sharp as, or better than previous results for SGDo. In Ch. 7 we study scaling limits of families of random walks (RW) that share the same increments up to a random permutation. We show weak convergence under the assumption that the sequence of permutations converges to a det. (higher-dimensional) permuton. This permuton determines the covariance function of the limiting Gaussian process. Conversely, we show that every Gaussian process with a covariance function determined by a permuton in this way arises as a weak scaling limit of families of RW with shared increments. Finally, we apply our weak convergence theory to show that EBMs arise as scaling limits of RW with finitely many distinct increments.

Related papers

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation [22.652143194356864]
We address the problem of solving strongly convex and smooth problems using gradient descent (SGD) with a constant step size.<n>We provide an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$.<n>Our analysis relies on the properties of the SGDs viewed as a time-homogeneous Markov chain.
arXiv Detail & Related papers (2024-10-07T15:02:48Z)
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach [49.97755400231656]
We show that a new accelerated DDPM sampler achieves accelerated performance for three broad distribution classes not considered before.<n>Our results show an improved dependency on the data dimension $d$ among accelerated DDPM type samplers.
arXiv Detail & Related papers (2024-02-21T16:11:47Z)
Convergence and concentration properties of constant step-size SGD through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD) We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance. All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z)
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD) We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z)
Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms [14.174806471635403]
We consider approximations of sampling algorithms, such as Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM) for Interacting Particle Dynamcs (IPD) We observe that the noise introduced by the approximation is nearly Gaussian due to the Central Limit Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness this structure to absorb the approximation error inside the diffusion process, and obtain improved convergence guarantees for these algorithms.
arXiv Detail & Related papers (2022-06-08T10:17:40Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
The Schr\"odinger Bridge between Gaussian Measures has a Closed Form [101.79851806388699]
We focus on the dynamic formulation of OT, also known as the Schr"odinger bridge (SB) problem. In this paper, we provide closed-form expressions for SBs between Gaussian measures.
arXiv Detail & Related papers (2022-02-11T15:59:01Z)
On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD) We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting. We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Convergence rates and approximation results for SGD and its continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes. First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling. Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.