Escape saddle points by a simple gradient-descent based algorithm
- URL: http://arxiv.org/abs/2111.14069v1
- Date: Sun, 28 Nov 2021 07:32:54 GMT
- Title: Escape saddle points by a simple gradient-descent based algorithm
- Authors: Chenyi Zhang, Tongyang Li
- Abstract summary: Escaping saddle points is a central research topic in non optimization.
We propose a simple gradient-based algorithm such that for a smooth function $fcolonmathbbRntomathbbR$, it outputs an $epsilonO(log n)2/epsilon4)$.
Technically, our main contribution is an idea of implementing a robust Hessian power method using only gradients.
- Score: 6.7298812735467095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Escaping saddle points is a central research topic in nonconvex optimization.
In this paper, we propose a simple gradient-based algorithm such that for a
smooth function $f\colon\mathbb{R}^n\to\mathbb{R}$, it outputs an
$\epsilon$-approximate second-order stationary point in $\tilde{O}(\log
n/\epsilon^{1.75})$ iterations. Compared to the previous state-of-the-art
algorithms by Jin et al. with $\tilde{O}((\log n)^{4}/\epsilon^{2})$ or
$\tilde{O}((\log n)^{6}/\epsilon^{1.75})$ iterations, our algorithm is
polynomially better in terms of $\log n$ and matches their complexities in
terms of $1/\epsilon$. For the stochastic setting, our algorithm outputs an
$\epsilon$-approximate second-order stationary point in $\tilde{O}((\log
n)^{2}/\epsilon^{4})$ iterations. Technically, our main contribution is an idea
of implementing a robust Hessian power method using only gradients, which can
find negative curvature near saddle points and achieve the polynomial speedup
in $\log n$ compared to the perturbed gradient descent methods. Finally, we
also perform numerical experiments that support our results.
Related papers
- Quantum spectral method for gradient and Hessian estimation [4.193480001271463]
Gradient descent is one of the most basic algorithms for solving continuous optimization problems.
We propose a quantum algorithm that returns an $varepsilon$-approximation of its gradient with query complexity $widetildeO (1/varepsilon)$.
We also propose two quantum algorithms for Hessian estimation, aiming to improve quantum analogs of Newton's method.
arXiv Detail & Related papers (2024-07-04T11:03:48Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z) - Quantum speedups for linear programming via interior point methods [1.8434042562191815]
We describe a quantum algorithm for solving a linear program with $n$ inequality constraints on $d$ variables.
Our algorithm speeds up the Newton step in the state-of-the-art interior point method of Lee and Sidford.
arXiv Detail & Related papers (2023-11-06T16:00:07Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient
Oracle Complexity [15.18055488087588]
We give an algorithm that minimizes the above convex formulation to $epsilon$-accuracy in $widetildeO(sum_i=1n d_i log (1 /epsilon))$ gradient computations.
Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.
arXiv Detail & Related papers (2022-08-07T20:53:42Z) - The First Optimal Algorithm for Smooth and
Strongly-Convex-Strongly-Concave Minimax Optimization [88.91190483500932]
In this paper, we revisit the smooth and strongly-strongly-concave minimax optimization problem.
Existing state-of-the-art methods do not match lower bound $Omegaleft(sqrtkappa_xkappa_ylog3 (kappa_xkappa_y)logfrac1epsilonright)$.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft( sqrtkappa_xkappa_ylog
arXiv Detail & Related papers (2022-05-11T17:33:07Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.