Randomized Coordinate Subgradient Method for Nonsmooth Composite
Optimization
- URL: http://arxiv.org/abs/2206.14981v3
- Date: Wed, 23 Aug 2023 13:25:16 GMT
- Title: Randomized Coordinate Subgradient Method for Nonsmooth Composite
Optimization
- Authors: Lei Zhao and Ding Chen and Daoli Zhu and Xiao Li
- Abstract summary: Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
- Score: 11.017632675093628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coordinate-type subgradient methods for addressing nonsmooth optimization
problems are relatively underexplored due to the set-valued nature of the
subdifferential. In this work, our study focuses on nonsmooth composite
optimization problems, encompassing a wide class of convex and weakly convex
(nonconvex nonsmooth) problems. By utilizing the chain rule of the composite
structure properly, we introduce the Randomized Coordinate Subgradient method
(RCS) for tackling this problem class. To the best of our knowledge, this is
the first coordinate subgradient method for solving general nonsmooth composite
optimization problems. In theory, we consider the linearly bounded subgradients
assumption for the objective function, which is more general than the
traditional Lipschitz continuity assumption, to account for practical
scenarios. We then conduct convergence analysis for RCS in both convex and
weakly convex cases based on this generalized Lipschitz-type assumption.
Specifically, we establish the $\widetilde{\mathcal{O}}$$(1/\sqrt{k})$
convergence rate in expectation and the $\tilde o(1/\sqrt{k})$ almost sure
asymptotic convergence rate in terms of the suboptimality gap when $f$ is
convex. For the case when $f$ is weakly convex and its subdifferential
satisfies the global metric subregularity property, we derive the
$\mathcal{O}(\varepsilon^{-4})$ iteration complexity in expectation. We also
establish an asymptotic convergence result. To justify the global metric
subregularity property utilized in the analysis, we establish this error bound
condition for the concrete (real-valued) robust phase retrieval problem. We
also provide a convergence lemma and the relationship between the global metric
subregularity properties of a weakly convex function and its Moreau envelope.
Finally, we conduct several experiments to demonstrate the possible superiority
of RCS over the subgradient method.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Revisiting Subgradient Method: Complexity and Convergence Beyond Lipschitz Continuity [24.45688490844496]
Subgradient method is one of the most fundamental algorithmic schemes for nonsmooth optimization.
In this work, we first extend the typical iteration complexity results for the subgradient method to cover non-Lipschitz convex and weakly convex minimization.
arXiv Detail & Related papers (2023-05-23T15:26:36Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - Decentralized Weakly Convex Optimization Over the Stiefel Manifold [28.427697270742947]
We focus on the Stiefel manifold in the decentralized setting, where a connected network of agents in $nMn log-1)$ are tested.
We propose an method called the decentralized subgradient method (DRSM)$ for forcing a natural station below $nMn log-1)$.
arXiv Detail & Related papers (2023-03-31T02:56:23Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Generalization in Supervised Learning Through Riemannian Contraction [4.3604518673788135]
In a supervised learning setting, we show that if an metric 0 is contracting in someian rate $lambda, it is uniformly uniformly rate with $math.
The results hold for gradient and deterministic loss surfaces, in both continuous and stable $-time.
They can be shown to be optimal in certain linear settings, such as over Descent$ convex or strongly convex loss surfaces.
arXiv Detail & Related papers (2022-01-17T23:08:47Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Better Theory for SGD in the Nonconvex World [2.6397379133308214]
Large-scale non optimization problems are ubiquitous in modern machine learning.
We perform experiments on the effects of a wide array of synthetic minibatch sizes on the Gradient Descent (SG) problem.
arXiv Detail & Related papers (2020-02-09T09:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.