Stationary Behavior of Constant Stepsize SGD Type Algorithms: An
Asymptotic Characterization
- URL: http://arxiv.org/abs/2111.06328v1
- Date: Thu, 11 Nov 2021 17:39:50 GMT
- Title: Stationary Behavior of Constant Stepsize SGD Type Algorithms: An
Asymptotic Characterization
- Authors: Zaiwei Chen, Shancong Mou, and Siva Theja Maguluri
- Abstract summary: We study the behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.
We show that the limiting scaled stationary distribution is a solution of an integral equation.
- Score: 4.932130498861987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic approximation (SA) and stochastic gradient descent (SGD)
algorithms are work-horses for modern machine learning algorithms. Their
constant stepsize variants are preferred in practice due to fast convergence
behavior. However, constant step stochastic iterative algorithms do not
converge asymptotically to the optimal solution, but instead have a stationary
distribution, which in general cannot be analytically characterized. In this
work, we study the asymptotic behavior of the appropriately scaled stationary
distribution, in the limit when the constant stepsize goes to zero.
Specifically, we consider the following three settings: (1) SGD algorithms with
smooth and strongly convex objective, (2) linear SA algorithms involving a
Hurwitz matrix, and (3) nonlinear SA algorithms involving a contractive
operator. When the iterate is scaled by $1/\sqrt{\alpha}$, where $\alpha$ is
the constant stepsize, we show that the limiting scaled stationary distribution
is a solution of an integral equation. Under a uniqueness assumption (which can
be removed in certain settings) on this equation, we further characterize the
limiting distribution as a Gaussian distribution whose covariance matrix is the
unique solution of a suitable Lyapunov equation. For SA algorithms beyond these
cases, our numerical experiments suggest that unlike central limit theorem type
results: (1) the scaling factor need not be $1/\sqrt{\alpha}$, and (2) the
limiting distribution need not be Gaussian. Based on the numerical study, we
come up with a formula to determine the right scaling factor, and make
insightful connection to the Euler-Maruyama discretization scheme for
approximating stochastic differential equations.
Related papers
- Fully Zeroth-Order Bilevel Programming via Gaussian Smoothing [7.143879014059895]
We study and analyze zeroth-order approximation algorithms for solving bilvel problems.
To the best of our knowledge, this is the first time that sample bounds are established for a fully zeroth-order bilevel optimization algorithm.
arXiv Detail & Related papers (2024-03-29T21:12:25Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - A stochastic linearized proximal method of multipliers for convex
stochastic optimization with expectation constraints [8.133190610747974]
We present a computable approximation type algorithm, namely the linearized proximal convex method of multipliers.
Some preliminary numerical results demonstrate the performance of the proposed algorithm.
arXiv Detail & Related papers (2021-06-22T07:24:17Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Inexact and Stochastic Generalized Conditional Gradient with Augmented
Lagrangian and Proximal Step [2.0196229393131726]
We analyze inexact and versions of the CGALP algorithm developed in the authors' previous paper.
This allows one to compute some gradients, terms, and/or linear minimization oracles in an inexact fashion.
We show convergence of the Lagrangian to an optimum and feasibility of the affine constraint.
arXiv Detail & Related papers (2020-05-11T14:52:16Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.