Related papers: A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

URL: http://arxiv.org/abs/2302.08766v4
Date: Tue, 20 Feb 2024 08:46:49 GMT
Title: A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization
Authors: Mathieu Dagr\'eou, Thomas Moreau, Samuel Vaiter, Pierre Ablin
Abstract summary: We propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $mathcalO((n+m)frac12varepsilon-1)$ oracle calls to achieve $varepsilon$-stationarity. We provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem.
Score: 23.401092942765196
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the algorithm requires $\mathcal{O}((n+m)^{\frac12}\varepsilon^{-1})$ oracle calls to achieve $\varepsilon$-stationarity with $n+m$ the total number of samples, which improves over all previous bilevel algorithms. Moreover, we provide a lower bound on the number of oracle calls required to get an approximate stationary point of the objective function of the bilevel problem. This lower bound is attained by our algorithm, making it optimal in terms of sample complexity.

Related papers

Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization [25.438298531555468]
Bilevel optimization is widely applied in many machine learning tasks such as hyper learning, meta learning and reinforcement learning. We propose an efficient Hessian/BiO method with the optimal convergence $frac1TT) under some mild conditions. We conduct some some experiments on the bilevel game hyper-stationary numerical convergence.
arXiv Detail & Related papers (2024-07-25T07:25:06Z)
Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles [14.697733592222658]
We show that a first-order method can converge at the near-optimal $tilde mathcalO(epsilon-2)$ rate as second-order methods. Our analysis further leads to simple first-order algorithms that achieve similar convergence rates for finding second-order stationary points.
arXiv Detail & Related papers (2023-06-26T17:07:54Z)
Alternating Implicit Projected SGD and Its Efficient Variants for Equality-constrained Bilevel Optimization [41.10094500516342]
This paper considers the bilevel optimization problems both in the equality constraints and constrained upper-level problems. By leveraging the equality constraints approach, first, an alternating implicit projection SGD approach is used for unconstrained bilevel problems.
arXiv Detail & Related papers (2022-11-14T03:47:43Z)
A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse [121.54116938140754]
We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems. We show that our algorithm converges with the rate of $O(epsilon-2)$.
arXiv Detail & Related papers (2021-12-09T02:27:52Z)
Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions. We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z)
Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications. We propose two new algorithms for bilevel optimization. We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Lower Bounds and Accelerated Algorithms for Bilevel Optimization [62.192297758346484]
Bilevel optimization has recently attracted growing interests due to its wide applications in modern machine learning problems. We show that our results are more challenging than that of minimax applications.
arXiv Detail & Related papers (2021-02-07T21:46:29Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Second-order Conditional Gradient Sliding [70.88478428882871]
We present the emphSecond-Order Conditional Gradient Sliding (SOCGS) algorithm.<n>The SOCGS algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations.<n>It is useful when the feasible region can only be accessed efficiently through a linear optimization oracle.
arXiv Detail & Related papers (2020-02-20T17:52:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.