Convex and Non-convex Optimization Under Generalized Smoothness
- URL: http://arxiv.org/abs/2306.01264v2
- Date: Fri, 3 Nov 2023 10:06:01 GMT
- Title: Convex and Non-convex Optimization Under Generalized Smoothness
- Authors: Haochuan Li, Jian Qian, Yi Tian, Alexander Rakhlin, Ali Jadbabaie
- Abstract summary: An analysis of convex and non- optimization methods often requires the Lipsitzness gradient, which limits the analysis by this trajectorys.
Recent work generalizes the gradient setting via the non-uniform smoothness condition.
- Score: 69.69521650503431
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classical analysis of convex and non-convex optimization methods often
requires the Lipshitzness of the gradient, which limits the analysis to
functions bounded by quadratics. Recent work relaxed this requirement to a
non-uniform smoothness condition with the Hessian norm bounded by an affine
function of the gradient norm, and proved convergence in the non-convex setting
via gradient clipping, assuming bounded noise. In this paper, we further
generalize this non-uniform smoothness condition and develop a simple, yet
powerful analysis technique that bounds the gradients along the trajectory,
thereby leading to stronger results for both convex and non-convex optimization
problems. In particular, we obtain the classical convergence rates for
(stochastic) gradient descent and Nesterov's accelerated gradient method in the
convex and/or non-convex setting under this general smoothness condition. The
new analysis approach does not require gradient clipping and allows
heavy-tailed noise with bounded variance in the stochastic setting.
Related papers
- Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results [60.92029979853314]
This paper investigates Gradient Normalization without (NSGDC) its gradient reduction variant (NSGDC-VR)
We present significant improvements in the theoretical results for both algorithms.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization [19.000530691874516]
We show that many non machine learning problems meet that kind of condition that extends beyond traditional non-smoothepseps.
We propose an independently-normalized gradient descent algorithm, which leverages independent sampling and normalization.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Convergence Analysis of Fractional Gradient Descent [0.0]
For optimization, it is of interest to understand the convergence of properties using fractional derivatives.
This paper analyzes the potential up fractional gradient descent.
empirical results will be presented.
arXiv Detail & Related papers (2023-11-30T10:24:07Z) - High Probability Analysis for Non-Convex Stochastic Optimization with
Clipping [13.025261730510847]
gradient clipping is a technique for dealing with the heavy-tailed neural networks.
Most theoretical guarantees only provide an in-expectation analysis and only on the performance.
Our analysis provides a relatively complete picture for the theoretical guarantee of optimization algorithms with gradient clipping.
arXiv Detail & Related papers (2023-07-25T17:36:56Z) - Projective Proximal Gradient Descent for A Class of Nonconvex Nonsmooth Optimization Problems: Fast Convergence Without Kurdyka-Lojasiewicz (KL) Property [19.988762532185884]
Non and nonsmooth optimization problems are important and challenging for learning.
In this paper, we show a new analysis showing fast convergence of PPGD.
arXiv Detail & Related papers (2023-04-20T17:39:24Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.