On the Complexity of First-Order Methods in Stochastic Bilevel
Optimization
- URL: http://arxiv.org/abs/2402.07101v1
- Date: Sun, 11 Feb 2024 04:26:35 GMT
- Title: On the Complexity of First-Order Methods in Stochastic Bilevel
Optimization
- Authors: Jeongyeol Kwon, Dohyun Kwon, Hanbaek Lyu
- Abstract summary: We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex.
Existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them.
We propose a simple first-order method that converges to an $epsilon$ stationary point using $O(epsilon-6), O(epsilon-4)$ access to first-order $y*$-aware oracles.
- Score: 9.649991673557167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of finding stationary points in Bilevel optimization
when the lower-level problem is unconstrained and strongly convex. The problem
has been extensively studied in recent years; the main technical challenge is
to keep track of lower-level solutions $y^*(x)$ in response to the changes in
the upper-level variables $x$. Subsequently, all existing approaches tie their
analyses to a genie algorithm that knows lower-level solutions and, therefore,
need not query any points far from them. We consider a dual question to such
approaches: suppose we have an oracle, which we call $y^*$-aware, that returns
an $O(\epsilon)$-estimate of the lower-level solution, in addition to
first-order gradient estimators {\it locally unbiased} within the
$\Theta(\epsilon)$-ball around $y^*(x)$. We study the complexity of finding
stationary points with such an $y^*$-aware oracle: we propose a simple
first-order method that converges to an $\epsilon$ stationary point using
$O(\epsilon^{-6}), O(\epsilon^{-4})$ access to first-order $y^*$-aware oracles.
Our upper bounds also apply to standard unbiased first-order oracles, improving
the best-known complexity of first-order methods by $O(\epsilon)$ with minimal
assumptions. We then provide the matching $\Omega(\epsilon^{-6})$,
$\Omega(\epsilon^{-4})$ lower bounds without and with an additional smoothness
assumption on $y^*$-aware oracles, respectively. Our results imply that any
approach that simulates an algorithm with an $y^*$-aware oracle must suffer the
same lower bounds.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.