Related papers: Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity

URL: http://arxiv.org/abs/2305.14161v1
Date: Tue, 23 May 2023 15:26:36 GMT
Title: Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity
Authors: Xiao Li, Lei Zhao, Daoli Zhu, Anthony Man-Cho So
Abstract summary: Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex objective functions. We provide convergence results for non-Lipschitz convex and weakly convex objective functions using proper diminishing rules on the step sizes.
Score: 29.56022052936922
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization. The existing complexity and convergence results for this algorithm are mainly derived for Lipschitz continuous objective functions. In this work, we first extend the typical complexity results for the subgradient method to convex and weakly convex minimization without assuming Lipschitz continuity. Specifically, we establish $\mathcal{O}(1/\sqrt{T})$ bound in terms of the suboptimality gap ``$f(x) - f^*$'' for convex case and $\mathcal{O}(1/{T}^{1/4})$ bound in terms of the gradient of the Moreau envelope function for weakly convex case. Furthermore, we provide convergence results for non-Lipschitz convex and weakly convex objective functions using proper diminishing rules on the step sizes. In particular, when $f$ is convex, we show $\mathcal{O}(\log(k)/\sqrt{k})$ rate of convergence in terms of the suboptimality gap. With an additional quadratic growth condition, the rate is improved to $\mathcal{O}(1/k)$ in terms of the squared distance to the optimal solution set. When $f$ is weakly convex, asymptotic convergence is derived. The central idea is that the dynamics of properly chosen step sizes rule fully controls the movement of the subgradient method, which leads to boundedness of the iterates, and then a trajectory-based analysis can be conducted to establish the desired results. To further illustrate the wide applicability of our framework, we extend the complexity results to the truncated subgradient, the stochastic subgradient, the incremental subgradient, and the proximal subgradient methods for non-Lipschitz functions.

Related papers

On the Convergence of Single-Timescale Actor-Critic [49.19842488693726]
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Decision Processes (MDs) with finite state spaces.<n>We demonstrate that the step sizes for both the actor and critic must decay as ( O(k-Pfrac12) ) with $k$ diverging from the conventional ( O(k-Pfrac12) ) rates commonly used in (non- optimal) Markov framework optimization.
arXiv Detail & Related papers (2024-10-11T14:46:29Z)
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods. In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z)
Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity. We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z)
A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization. In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z)
Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient [0.0]
We first consider unconstrained convex optimization with Lipschitz continuous gradient (LLCG) and propose accelerated proximal gradient (APG) methods for solving it. The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of $cal O(varepsilon-1/2log varepsilon-1)$. Preliminary numerical results are presented to demonstrate the performance of our proposed methods.
arXiv Detail & Related papers (2022-06-02T10:34:26Z)
Thinking Outside the Ball: Optimal Learning with Gradient Descent for Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form. We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$. But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z)
Generalized Optimistic Methods for Convex-Concave Saddle Point Problems [24.5327016306566]
The optimistic method has seen increasing popularity for solving convex-concave saddle point problems. We develop a backtracking line search scheme to select the step sizes without knowledge of coefficients.
arXiv Detail & Related papers (2022-02-19T20:31:05Z)
A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth. We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z)
Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions. In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
Randomized Bregman Coordinate Descent Methods for Non-Lipschitz Optimization [31.474280642125734]
A new textitrandomized Bregman (block) coordinate descent (CD) method is proposed. We show that the proposed method is $O(epsilon-2n)$ to achieve $epsilon-stationary point, where $n$ is the number of blocks of coordinates.
arXiv Detail & Related papers (2020-01-15T09:57:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.