Related papers: Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

URL: http://arxiv.org/abs/2305.14161v2
Date: Thu, 31 Oct 2024 02:34:43 GMT
Title: Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity
Authors: Xiao Li, Lei Zhao, Daoli Zhu, Anthony Man-Cho So,
Abstract summary: Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
Score: 24.45688490844496
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. The existing complexity and convergence results for this method are mainly derived for Lipschitz continuous objective functions. In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization. Specifically, for the convex case, we can drive the suboptimality gap to below $\varepsilon$ in $\mathcal{O}( \varepsilon^{-2} )$ iterations; for the weakly convex case, we can drive the gradient norm of the Moreau envelope of the objective function to below $\varepsilon$ in $\mathcal{O}( \varepsilon^{-4} )$ iterations. Then, we provide convergence results for the subgradient method in the non-Lipschitz setting when proper diminishing rules on the step size are used. In particular, when $f$ is convex, we establish an $\mathcal{O}(\log(k)/\sqrt{k})$ rate of convergence in terms of the suboptimality gap, where $k$ represents the iteration count. With an additional quadratic growth property, the rate is improved to $\mathcal{O}(1/k)$ in terms of the squared distance to the optimal solution set. When $f$ is weakly convex, asymptotic convergence is established. Our results neither require any modification to the subgradient method nor impose any growth condition on the subgradients, while our analysis is surprisingly simple. To further illustrate the wide applicability of our framework, we extend the aforementioned iteration complexity results to cover the truncated subgradient, the stochastic subgradient, and the proximal subgradient methods for non-Lipschitz convex / weakly convex objective functions.

Related papers

On the Convergence of Single-Timescale Actor-Critic [49.19842488693726]
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Decision Processes (MDs) with finite state spaces.<n>We demonstrate that the step sizes for both the actor and critic must decay as ( O(k-Pfrac12) ) with $k$ diverging from the conventional ( O(k-Pfrac12) ) rates commonly used in (non- optimal) Markov framework optimization.
arXiv Detail & Related papers (2024-10-11T14:46:29Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization. In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z)
Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient [0.0]
We first consider unconstrained convex optimization with Lipschitz continuous gradient (LLCG) and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of $cal O(varepsilon-1/2log varepsilon-1)$. Preliminary numerical results are presented to demonstrate the performance of our proposed methods.
arXiv Detail & Related papers (2022-06-02T10:34:26Z)
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form. We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z)
Generalized Optimistic Methods for Convex-Concave Saddle Point Problems [24.5327016306566]
The optimistic method has seen increasing popularity for solving convex-concave saddle point problems. We develop a backtracking line search scheme to select the step sizes without knowledge of coefficients.
arXiv Detail & Related papers (2022-02-19T20:31:05Z)
A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization [31.474280642125734]
A new textitrandomized Bregman (block) coordinate descent (CD) method is proposed. We show that the proposed method is $O(epsilon-2n)$ to achieve $epsilon-stationary point, where $n$ is the number of blocks of coordinates.
arXiv Detail & Related papers (2020-01-15T09:57:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.