Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
- URL: http://arxiv.org/abs/2303.12277v3
- Date: Sat, 20 May 2023 04:12:11 GMT
- Title: Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
- Authors: Zijian Liu, Zhengyuan Zhou
- Abstract summary: In a heavy-tailed noise regime, the difference between the gradient and the true rate is assumed to have a finite $p-th moment.
This paper provides a comprehensive analysis of nonsmooth convex optimization with heavy-tailed noises.
- Score: 22.758674468435302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, several studies consider the stochastic optimization problem but in
a heavy-tailed noise regime, i.e., the difference between the stochastic
gradient and the true gradient is assumed to have a finite $p$-th moment (say
being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$,
which not only generalizes the traditional finite variance assumption ($p=2$)
but also has been observed in practice for several different tasks. Under this
challenging assumption, lots of new progress has been made for either convex or
nonconvex problems, however, most of which only consider smooth objectives. In
contrast, people have not fully explored and well understood this problem when
functions are nonsmooth. This paper aims to fill this crucial gap by providing
a comprehensive analysis of stochastic nonsmooth convex optimization with
heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas,
which is only proved to converge in expectation but under the additional strong
convexity assumption. Under appropriate choices of parameters, for both convex
and strongly convex functions, we not only establish the first high-probability
rates but also give refined in-expectation bounds compared with existing works.
Remarkably, all of our results are optimal (or nearly optimal up to logarithmic
factors) with respect to the time horizon $T$ even when $T$ is unknown in
advance. Additionally, we show how to make the algorithm parameter-free with
respect to $\sigma$, in other words, the algorithm can still guarantee
convergence without any prior knowledge of $\sigma$. Furthermore, an initial
distance adaptive convergence rate is provided if $\sigma$ is assumed to be
known.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
The Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding.
It still remains unclear whether the lastiterate convergence can be provably extended to wider composite optimization and non-Euclidean norms.
arXiv Detail & Related papers (2023-12-13T21:41:06Z) - An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization [37.300102993926046]
We study the complexity of producing neither smooth nor convex points of Lipschitz objectives which are possibly using only zero-order evaluations.
Our analysis is based on a simple yet powerful.
Goldstein-subdifferential set, which allows recent advancements in.
nonsmooth non optimization.
arXiv Detail & Related papers (2023-07-10T11:56:04Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization
under Infinite Noise Variance [17.199063087458907]
We quantify the convergence rate of the Mirror Descent algorithm with a class of uniformly convex mirror maps.
This algorithm does not require any explicit gradient clipping or normalization.
We complement our results with information-theoretic lower bounds showing that no other algorithm using only first-order oracles can achieve improved rates.
arXiv Detail & Related papers (2022-02-23T17:08:40Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - A New Randomized Primal-Dual Algorithm for Convex Optimization with
Optimal Last Iterate Rates [16.54912614895861]
We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems.
We prove that our algorithm achieves optimal convergence rates in two cases: general convexity and strong convexity.
Our results show that the proposed method has encouraging performance on different experiments.
arXiv Detail & Related papers (2020-03-03T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.