Related papers: A Generalized Version of Chung's Lemma and its Applications

A Generalized Version of Chung's Lemma and its Applications

URL: http://arxiv.org/abs/2406.05637v1
Date: Sun, 9 Jun 2024 04:25:10 GMT
Title: A Generalized Version of Chung's Lemma and its Applications
Authors: Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu,
Abstract summary: We develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. As a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape.
Score: 10.570672679063394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

Related papers

Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness [50.78508362183774]
Shuffling-type gradient methods are favored in practice for their simplicity and rapid empirical performance.<n>Most require the Lipschitz condition, which is often not met in common machine learning schemes.
arXiv Detail & Related papers (2025-07-11T15:36:48Z)
Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method [10.239769272138995]
We propose a novel approach to solve problems in machine learning.<n>We introduce two strategies for updating the random submanifold.<n>We show how our approach can be generalized to a wide variety of problems.
arXiv Detail & Related papers (2025-05-18T11:46:44Z)
A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning. This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z)
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z)
A Unified Convergence Theorem for Stochastic Optimization Methods [4.94128206910124]
We provide a fundamental unified convergence theorem used for deriving convergence results for a series of unified optimization methods. As a direct application, we recover almost sure convergence results under general settings.
arXiv Detail & Related papers (2022-06-08T14:01:42Z)
On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces. We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints. A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual. This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z)
Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization. We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Conditional gradient methods for stochastically constrained convex minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems. The most important feature of our framework is that only a subset of the constraints is processed at each iteration. Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization [16.714109768541785]
We provide a single analysis for all methods that satisfy the SGD variants in the non non regime. We also provide a unified analysis for obtaining faster linear convergence in the non non regime under PL condition.
arXiv Detail & Related papers (2020-06-12T08:58:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.