Achieving ${O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free
Stochastic Bilevel Optimization
- URL: http://arxiv.org/abs/2312.03807v2
- Date: Wed, 20 Dec 2023 15:21:56 GMT
- Title: Achieving ${O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free
Stochastic Bilevel Optimization
- Authors: Yifan Yang, Peiyao Xiao, Kaiyi Ji
- Abstract summary: We show how to achieve an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient bilevel optimization.
As far as we know, this is the first Hessian/Jacobian-free method with an $O(epsilon1.5)$ sample complexity for non-strongly-accurate stationary point gradient optimization.
- Score: 21.661676550609876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we revisit the bilevel optimization problem, in which the
upper-level objective function is generally nonconvex and the lower-level
objective function is strongly convex. Although this type of problem has been
studied extensively, it still remains an open question how to achieve an
${O}(\epsilon^{-1.5})$ sample complexity in Hessian/Jacobian-free stochastic
bilevel optimization without any second-order derivative computation. To fill
this gap, we propose a novel Hessian/Jacobian-free bilevel optimizer named
FdeHBO, which features a simple fully single-loop structure, a projection-aided
finite-difference Hessian/Jacobian-vector approximation, and momentum-based
updates. Theoretically, we show that FdeHBO requires ${O}(\epsilon^{-1.5})$
iterations (each using ${O}(1)$ samples and only first-order gradient
information) to find an $\epsilon$-accurate stationary point. As far as we
know, this is the first Hessian/Jacobian-free method with an
${O}(\epsilon^{-1.5})$ sample complexity for nonconvex-strongly-convex
stochastic bilevel optimization.
Related papers
- Quantum Algorithms for Non-smooth Non-convex Optimization [30.576546266390714]
This paper considers the problem for finding the $(,epsilon)$-Goldstein stationary point of Lipschitz continuous objective.
We construct a zeroth-order quantum estimator for the surrogate oracle function.
arXiv Detail & Related papers (2024-10-21T16:52:26Z) - Projection-Free Methods for Stochastic Simple Bilevel Optimization with
Convex Lower-level Problem [16.9187409976238]
We study a class of convex bilevel optimization problems, also known as simple bilevel optimization.
We introduce novel bilevel optimization methods that approximate the solution set of the lower-level problem.
arXiv Detail & Related papers (2023-08-15T02:37:11Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully
First-Order Oracles [14.697733592222658]
We show that a first-order method can converge at the near-optimal $tilde mathcalO(epsilon-2)$ rate as second-order methods.
Our analysis further leads to simple first-order algorithms that achieve similar convergence rates for finding second-order stationary points.
arXiv Detail & Related papers (2023-06-26T17:07:54Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - ReSQueing Parallel and Private Stochastic Convex Optimization [59.53297063174519]
We introduce a new tool for BFG convex optimization (SCO): a Reweighted Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density.
We develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings.
arXiv Detail & Related papers (2023-01-01T18:51:29Z) - A Newton-CG based barrier method for finding a second-order stationary
point of nonconvex conic optimization with complexity guarantees [0.5922488908114022]
We consider finding an approximate second-order stationary point (SOSP) that minimizes a twice differentiable function intersection.
Our method is not only iteration, but also achieves $mincal O(epsilon/2) which matches the best known complexity.
arXiv Detail & Related papers (2022-07-12T17:21:45Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.