Alternating Implicit Projected SGD and Its Efficient Variants for
Equality-constrained Bilevel Optimization
- URL: http://arxiv.org/abs/2211.07096v1
- Date: Mon, 14 Nov 2022 03:47:43 GMT
- Title: Alternating Implicit Projected SGD and Its Efficient Variants for
Equality-constrained Bilevel Optimization
- Authors: Quan Xiao, Han Shen, Wotao Yin, Tianyi Chen
- Abstract summary: This paper considers the bilevel optimization problems both in the equality constraints and constrained upper-level problems.
By leveraging the equality constraints approach, first, an alternating implicit projection SGD approach is used for unconstrained bilevel problems.
- Score: 41.10094500516342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic bilevel optimization, which captures the inherent nested structure
of machine learning problems, is gaining popularity in many recent
applications. Existing works on bilevel optimization mostly consider either
unconstrained problems or constrained upper-level problems. This paper
considers the stochastic bilevel optimization problems with equality
constraints both in the upper and lower levels. By leveraging the special
structure of the equality constraints problem, the paper first presents an
alternating implicit projected SGD approach and establishes the $\tilde{\cal
O}(\epsilon^{-2})$ sample complexity that matches the state-of-the-art
complexity of ALSET \citep{chen2021closing} for unconstrained bilevel problems.
To further save the cost of projection, the paper presents two alternating
implicit projection-efficient SGD approaches, where one algorithm enjoys the
$\tilde{\cal O}(\epsilon^{-2}/T)$ upper-level and ${\cal
O}(\epsilon^{-1.5}/T^{\frac{3}{4}})$ lower-level projection complexity with
${\cal O}(T)$ lower-level batch size, and the other one enjoys $\tilde{\cal
O}(\epsilon^{-1.5})$ upper-level and lower-level projection complexity with
${\cal O}(1)$ batch size. Application to federated bilevel optimization has
been presented to showcase the empirical performance of our algorithms. Our
results demonstrate that equality-constrained bilevel optimization with
strongly-convex lower-level problems can be solved as efficiently as stochastic
single-level optimization problems.
Related papers
- Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization [25.438298531555468]
Bilevel optimization is widely applied in many machine learning tasks such as hyper learning, meta learning and reinforcement learning.
We propose an efficient Hessian/BiO method with the optimal convergence $frac1TT) under some mild conditions.
We conduct some some experiments on the bilevel game hyper-stationary numerical convergence.
arXiv Detail & Related papers (2024-07-25T07:25:06Z) - On Momentum-Based Gradient Methods for Bilevel Optimization with
Nonconvex Lower-Level [25.438298531555468]
Bilevel optimization is a popular process in machine learning tasks.
In this paper, we investigate the non-representation problem of bilevel PL game.
We show that our method improves the existing best results by a factor of $tO(Enabla F(x)leq epsilon$)
arXiv Detail & Related papers (2023-03-07T14:55:05Z) - On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis [18.08351275534193]
Bilevel optimization reveals the inner structure of otherwise oblique optimization problems.
A common goal in bilevel optimization is to a hyper-objective that implicitly depends on the solution of the set of parameters.
arXiv Detail & Related papers (2023-01-02T15:09:12Z) - A Constrained Optimization Approach to Bilevel Optimization with
Multiple Inner Minima [49.320758794766185]
We propose a new approach, which convert the bilevel problem to an equivalent constrained optimization, and then the primal-dual algorithm can be used to solve the problem.
Such an approach enjoys a few advantages including (a) addresses the multiple inner minima challenge; (b) fully first-order efficiency without Jacobian computations.
arXiv Detail & Related papers (2022-03-01T18:20:01Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Single-Timescale Stochastic Bilevel Optimization Method [34.868681953242934]
This paper develops a new method for a class of bilevel problems that we term Single-Timescale stochAstic BiEl optimization (STABLE) method.
To achieve an $epsilon$-stationary point of the bilevel problem, STABLE requires $cal O(epsilon-2)$ samples in total; and to achieve an $epsilon$-optimal solution in the strongly convex case, STABLE requires $cal O(epsilon-1)$ samples.
arXiv Detail & Related papers (2021-02-09T06:35:30Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.