Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
- URL: http://arxiv.org/abs/2503.04712v1
- Date: Thu, 06 Mar 2025 18:57:34 GMT
- Title: Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
- Authors: Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Benjamin Tang,
- Abstract summary: We show appropriate points under first order methods to first and second order points.<n>To our knowledge, this is the firstnontextbook for non stationary optimization under first order generalization.
- Score: 7.075987963315121
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of non-convex optimization on functions that are not necessarily smooth using first order methods. Smoothness (functions whose gradient and/or Hessian are Lipschitz) is not satisfied by many machine learning problems in both theory and practice, motivating a recent line of work studying the convergence of first order methods to first order stationary points under appropriate generalizations of smoothness. We develop a novel framework to study convergence of first order methods to first and \textit{second} order stationary points under generalized smoothness, under more general smoothness assumptions than the literature. Using our framework, we show appropriate variants of GD and SGD (e.g. with appropriate perturbations) can converge not just to first order but also \textit{second order stationary points} in runtime polylogarithmic in the dimension. To our knowledge, our work contains the first such result, as well as the first 'non-textbook' rate for non-convex optimization under generalized smoothness. We demonstrate that several canonical non-convex optimization problems fall under our setting and framework.
Related papers
- Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessian terms of a general norm and its dual.<n>We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization [52.61737731453222]
Non-Machine Learning problems typically do not adhere to the standard smoothness assumption.<n>We propose and analyze new methods with local steps, partial participation of clients, and Random Random Reshuffling.<n>Our theory is consistent with the known results for standard smooth problems.
arXiv Detail & Related papers (2024-12-03T19:20:56Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - Oracle Complexity of Single-Loop Switching Subgradient Methods for
Non-Smooth Weakly Convex Functional Constrained Optimization [12.84152722535866]
We consider a non- constrained optimization problem where the objective function is weakly convex or weakly convex.
To solve the problem, we consider the subgradient method, which is first-order tuning and easily implement.
arXiv Detail & Related papers (2023-01-30T22:13:14Z) - Low-Rank Mirror-Prox for Nonsmooth and Low-Rank Matrix Optimization
Problems [19.930021245647705]
Low-rank and nonsmooth matrix optimization problems capture many fundamental tasks in statistics and machine learning.
In this paper we consider standard convex relaxations for such problems.
We prove that under a textitstrict complementarity condition and under the relatively mild assumption that the nonsmooth objective can be written as a maximum of smooth functions, approximated variants of two popular textitmirror-prox methods.
arXiv Detail & Related papers (2022-06-23T08:10:54Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - On the Oracle Complexity of Higher-Order Smooth Non-Convex Finite-Sum
Optimization [1.6973426830397942]
We prove lower bounds for higher-order methods in smooth non finite-sum optimization.
We show that a pth-order regularized method cannot profit from the finitesum objective.
We show that a new second-order smoothness assumption can be seen as an analogue to the first-order mean-squared smoothness.
arXiv Detail & Related papers (2021-03-08T23:33:58Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic
Optimization Problems [12.010310883787911]
We analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) optimization problems.
Experimental results indicate how the proposed algorithms empirically outperform its zerothorder gradient descent and its design variant.
arXiv Detail & Related papers (2020-05-19T07:44:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.