On the Complexity of First-Order Methods in Stochastic Bilevel
Optimization
- URL: http://arxiv.org/abs/2402.07101v1
- Date: Sun, 11 Feb 2024 04:26:35 GMT
- Title: On the Complexity of First-Order Methods in Stochastic Bilevel
Optimization
- Authors: Jeongyeol Kwon, Dohyun Kwon, Hanbaek Lyu
- Abstract summary: We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex.
Existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them.
We propose a simple first-order method that converges to an $epsilon$ stationary point using $O(epsilon-6), O(epsilon-4)$ access to first-order $y*$-aware oracles.
- Score: 9.649991673557167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of finding stationary points in Bilevel optimization
when the lower-level problem is unconstrained and strongly convex. The problem
has been extensively studied in recent years; the main technical challenge is
to keep track of lower-level solutions $y^*(x)$ in response to the changes in
the upper-level variables $x$. Subsequently, all existing approaches tie their
analyses to a genie algorithm that knows lower-level solutions and, therefore,
need not query any points far from them. We consider a dual question to such
approaches: suppose we have an oracle, which we call $y^*$-aware, that returns
an $O(\epsilon)$-estimate of the lower-level solution, in addition to
first-order gradient estimators {\it locally unbiased} within the
$\Theta(\epsilon)$-ball around $y^*(x)$. We study the complexity of finding
stationary points with such an $y^*$-aware oracle: we propose a simple
first-order method that converges to an $\epsilon$ stationary point using
$O(\epsilon^{-6}), O(\epsilon^{-4})$ access to first-order $y^*$-aware oracles.
Our upper bounds also apply to standard unbiased first-order oracles, improving
the best-known complexity of first-order methods by $O(\epsilon)$ with minimal
assumptions. We then provide the matching $\Omega(\epsilon^{-6})$,
$\Omega(\epsilon^{-4})$ lower bounds without and with an additional smoothness
assumption on $y^*$-aware oracles, respectively. Our results imply that any
approach that simulates an algorithm with an $y^*$-aware oracle must suffer the
same lower bounds.
Related papers
- First-Order Methods for Linearly Constrained Bilevel Optimization [38.19659447295665]
We present first-order linearly constrained optimization methods for high-level Hessian computations.
For linear inequality constraints, we attain $(delta,epsilon)$-Goldstein stationarity in $widetildeO(ddelta-1 epsilon-3)$ gradient oracle calls.
arXiv Detail & Related papers (2024-06-18T16:41:21Z) - The Computational Complexity of Finding Stationary Points in Non-Convex Optimization [53.86485757442486]
Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-order but smooth objective functions is a computational problem.
We show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible.
arXiv Detail & Related papers (2023-10-13T14:52:46Z) - On Penalty Methods for Nonconvex Bilevel Optimization and First-Order
Stochastic Approximation [13.813242559935732]
We show first-order algorithms for solving Bilevel Optimization problems.
In particular, we show a strong connection between the penalty function and the hyper-objective.
We show an improved oracle-complexity of $O(epsilon-3)$ and $O(epsilon-5)$, respectively.
arXiv Detail & Related papers (2023-09-04T18:25:43Z) - Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully
First-Order Oracles [14.697733592222658]
We show that a first-order method can converge at the near-optimal $tilde mathcalO(epsilon-2)$ rate as second-order methods.
Our analysis further leads to simple first-order algorithms that achieve similar convergence rates for finding second-order stationary points.
arXiv Detail & Related papers (2023-06-26T17:07:54Z) - The First Optimal Acceleration of High-Order Methods in Smooth Convex
Optimization [88.91190483500932]
We study the fundamental open question of finding the optimal high-order algorithm for solving smooth convex minimization problems.
The reason for this is that these algorithms require performing a complex binary procedure, which makes them neither optimal nor practical.
We fix this fundamental issue by providing the first algorithm with $mathcalOleft(epsilon-2/(p+1)right) $pth order oracle complexity.
arXiv Detail & Related papers (2022-05-19T16:04:40Z) - Perseus: A Simple and Optimal High-Order Method for Variational
Inequalities [81.32967242727152]
A VI involves finding $xstar in mathcalX$ such that $langle F(x), x - xstarrangle geq 0$ for all $x in mathcalX$.
We propose a $pth$-order method that does textitnot require any line search procedure and provably converges to a weak solution at a rate of $O(epsilon-2/(p+1))$.
arXiv Detail & Related papers (2022-05-06T13:29:14Z) - Lifted Primal-Dual Method for Bilinearly Coupled Smooth Minimax
Optimization [47.27237492375659]
We study the bilinearly coupled minimax problem: $min_x max_y f(x) + ytop A x - h(y)$, where $f$ and $h$ are both strongly convex smooth functions.
No known first-order algorithms have hitherto achieved the lower complexity bound of $Omega(sqrtfracL_xmu_x + frac|A|sqrtmu_x,mu_y) log(frac1vareps
arXiv Detail & Related papers (2022-01-19T05:56:19Z) - Thinking Inside the Ball: Near-Optimal Minimization of the Maximal Loss [41.17536985461902]
We prove an oracle complexity lower bound scaling as $Omega(Nepsilon-2/3)$, showing that our dependence on $N$ is optimal up to polylogarithmic factors.
We develop methods with improved complexity bounds of $tildeO(Nepsilon-2/3 + sqrtNepsilon-8/3)$ in the non-smooth case and $tildeO(Nepsilon-2/3 + sqrtNepsilon-1)$ in
arXiv Detail & Related papers (2021-05-04T21:49:15Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Second-Order Information in Non-Convex Stochastic Optimization: Power
and Limitations [54.42518331209581]
We find an algorithm which finds.
epsilon$-approximate stationary point (with $|nabla F(x)|le epsilon$) using.
$(epsilon,gamma)$surimate random random points.
Our lower bounds here are novel even in the noiseless case.
arXiv Detail & Related papers (2020-06-24T04:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.