Related papers: Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

URL: http://arxiv.org/abs/2401.13971v2
Date: Tue, 05 Nov 2024 22:41:13 GMT
Title: Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity
Authors: Wenzhi Gao, Qi Deng,
Abstract summary: We show that a wide class of continuity algorithms, including the subgradient method, preserve the $mathO convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $|x|$ or locally estimated through independent random samples.
Score: 5.866816093792934
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper considers stochastic weakly convex optimization without the standard Lipschitz continuity assumption. Based on new adaptive regularization (stepsize) strategies, we show that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $\|x\|$ or locally estimated through independent random samples.

Related papers

Towards Weaker Variance Assumptions for Stochastic Optimization [19.339358874690568]
We revisit a classical assumption for analyzing gradient algorithms where the squared norm of the subgradient is allowed to grow as fast as the squared norm of the optimization variable. We analyze convex problems with functional constraints or regularized convex-concave min-max problems. We obtain rates for optimality measures that do not require boundedness of the feasible set.
arXiv Detail & Related papers (2025-04-14T07:26:34Z)
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density. We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order. We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z)
Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD) We use gradient new outliers as norm clipping chains. We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z)
High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z)
Randomized Coordinate Subgradient Method for Nonsmooth Composite Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z)
Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$. We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Stochastic Online Convex Optimization. Application to probabilistic time series forecasting [0.0]
We prove algorithms such as online newton steps and a scale-free 10 version of Bernstein online achieve best-known rates in unbounded settings. Our fast-rate regret bounds are any-time valid.
arXiv Detail & Related papers (2021-02-01T09:49:15Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.