A Fully First-Order Method for Stochastic Bilevel Optimization
- URL: http://arxiv.org/abs/2301.10945v1
- Date: Thu, 26 Jan 2023 05:34:21 GMT
- Title: A Fully First-Order Method for Stochastic Bilevel Optimization
- Authors: Jeongyeol Kwon, Dohyun Kwon, Stephen Wright, Robert Nowak
- Abstract summary: We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
- Score: 8.663726907303303
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider stochastic unconstrained bilevel optimization problems when only
the first-order gradient oracles are available. While numerous optimization
methods have been proposed for tackling bilevel problems, existing methods
either tend to require possibly expensive calculations regarding Hessians of
lower-level objectives, or lack rigorous finite-time performance guarantees. In
this work, we propose a Fully First-order Stochastic Approximation (F2SA)
method, and study its non-asymptotic convergence properties. Specifically, we
show that F2SA converges to an $\epsilon$-stationary solution of the bilevel
problem after $\epsilon^{-7/2}, \epsilon^{-5/2}$, and $\epsilon^{-3/2}$
iterations (each iteration using $O(1)$ samples) when stochastic noises are in
both level objectives, only in the upper-level objective, and not present
(deterministic settings), respectively. We further show that if we employ
momentum-assisted gradient estimators, the iteration complexities can be
improved to $\epsilon^{-5/2}, \epsilon^{-4/2}$, and $\epsilon^{-3/2}$,
respectively. We demonstrate even superior practical performance of the
proposed method over existing second-order based approaches on MNIST
data-hypercleaning experiments.
Related papers
- Adaptive Mirror Descent Bilevel Optimization [25.438298531555468]
We propose a class of efficient adaptive bilevel methods based on mirror descent for non bifraclevel optimization.
We provide an analysis for methods under some conditions, and prove that our methods have a fast number of iterations.
arXiv Detail & Related papers (2023-11-08T08:17:09Z) - On Penalty Methods for Nonconvex Bilevel Optimization and First-Order
Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems.
In particular, we show a strong connection between the penalty function and the hyper-objective.
We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z) - On Momentum-Based Gradient Methods for Bilevel Optimization with
Nonconvex Lower-Level [25.438298531555468]
Bilevel optimization is a popular process in machine learning tasks.
In this paper, we investigate the non-representation problem of bilevel PL game.
We show that our method improves the existing best results by a factor of $tO(Enabla F(x)leq epsilon$)
arXiv Detail & Related papers (2023-03-07T14:55:05Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Accelerated first-order methods for convex optimization with locally
Lipschitz continuous gradient [0.0]
We first consider unconstrained convex optimization with Lipschitz continuous gradient (LLCG) and propose accelerated proximal gradient (APG) methods for solving it.
The proposed APG methods are equipped with a verifiable termination criterion and enjoy an operation complexity of $cal O(varepsilon-1/2log varepsilon-1)$.
Preliminary numerical results are presented to demonstrate the performance of our proposed methods.
arXiv Detail & Related papers (2022-06-02T10:34:26Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.