Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations
- URL: http://arxiv.org/abs/2006.13476v1
- Date: Wed, 24 Jun 2020 04:41:43 GMT
- Title: Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations
- Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush
Sekhari, Karthik Sridharan
- Abstract summary: We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
- Score: 54.42518331209581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We design an algorithm which finds an $\epsilon$-approximate stationary point
(with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic
gradient and Hessian-vector products, matching guarantees that were previously
available only under a stronger assumption of access to multiple queries with
the same random seed. We prove a lower bound which establishes that this rate
is optimal and---surprisingly---that it cannot be improved using stochastic
$p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of
the objective are Lipschitz. Together, these results characterize the
complexity of non-convex stochastic optimization with second-order methods and
beyond. Expanding our scope to the oracle complexity of finding
$(\epsilon,\gamma)$-approximate second-order stationary points, we establish
nearly matching upper and lower bounds for stochastic second-order methods. Our
lower bounds here are novel even in the noiseless case.
Related papers
- Achieving ${O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free
Stochastic Bilevel Optimization [21.661676550609876]
We show how to achieve an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient bilevel optimization.
As far as we know, this is the first Hessian/Jacobian-free method with an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient optimization.
arXiv Detail & Related papers (2023-12-06T16:34:58Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - A Newton-CG based barrier method for finding a second-order stationary
point of nonconvex conic optimization with complexity guarantees [0.5922488908114022]
We consider finding an approximate second-order stationary point (SOSP) that minimizes a twice differentiable function intersection.
Our method is not only iteration, but also achieves $mincal O(epsilon/2) which matches the best known complexity.
arXiv Detail & Related papers (2022-07-12T17:21:45Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions [84.49087114959872]
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonsmooth functions.
In particular, we study Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions.
arXiv Detail & Related papers (2020-02-10T23:23:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.