On Momentum-Based Gradient Methods for Bilevel Optimization with
Nonconvex Lower-Level
- URL: http://arxiv.org/abs/2303.03944v4
- Date: Sat, 18 Nov 2023 23:50:34 GMT
- Title: On Momentum-Based Gradient Methods for Bilevel Optimization with
Nonconvex Lower-Level
- Authors: Feihu Huang
- Abstract summary: Bilevel optimization is a popular process in machine learning tasks.
In this paper, we investigate the non-representation problem of bilevel PL game.
We show that our method improves the existing best results by a factor of $tO(Enabla F(x)leq epsilon$)
- Score: 25.438298531555468
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bilevel optimization is a popular two-level hierarchical optimization, which
has been widely applied to many machine learning tasks such as hyperparameter
learning, meta learning and continual learning. Although many bilevel
optimization methods recently have been developed, the bilevel methods are not
well studied when the lower-level problem is nonconvex. To fill this gap, in
the paper, we study a class of nonconvex bilevel optimization problems, where
both upper-level and lower-level problems are nonconvex, and the lower-level
problem satisfies Polyak-{\L}ojasiewicz (PL) condition. We propose an efficient
momentum-based gradient bilevel method (MGBiO) to solve these deterministic
problems. Meanwhile, we propose a class of efficient momentum-based stochastic
gradient bilevel methods (MSGBiO and VR-MSGBiO) to solve these stochastic
problems. Moreover, we provide a useful convergence analysis framework for our
methods. Specifically, under some mild conditions, we prove that our MGBiO
method has a sample (or gradient) complexity of $O(\epsilon^{-2})$ for finding
an $\epsilon$-stationary solution of the deterministic bilevel problems (i.e.,
$\|\nabla F(x)\|\leq \epsilon$), which improves the existing best results by a
factor of $O(\epsilon^{-1})$. Meanwhile, we prove that our MSGBiO and VR-MSGBiO
methods have sample complexities of $\tilde{O}(\epsilon^{-4})$ and
$\tilde{O}(\epsilon^{-3})$, respectively, in finding an $\epsilon$-stationary
solution of the stochastic bilevel problems (i.e., $\mathbb{E}\|\nabla
F(x)\|\leq \epsilon$), which improves the existing best results by a factor of
$\tilde{O}(\epsilon^{-3})$. Extensive experimental results on bilevel PL game
and hyper-representation learning demonstrate the efficiency of our algorithms.
This paper commemorates the mathematician Boris Polyak (1935 -2023).
Related papers
- Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization [25.438298531555468]
Bilevel optimization is widely applied in many machine learning tasks such as hyper learning, meta learning and reinforcement learning.
We propose an efficient Hessian/BiO method with the optimal convergence $frac1TT) under some mild conditions.
We conduct some some experiments on the bilevel game hyper-stationary numerical convergence.
arXiv Detail & Related papers (2024-07-25T07:25:06Z) - Adaptive Mirror Descent Bilevel Optimization [25.438298531555468]
We propose a class of efficient adaptive bilevel methods based on mirror descent for non bifraclevel optimization.
We provide an analysis for methods under some conditions, and prove that our methods have a fast number of iterations.
arXiv Detail & Related papers (2023-11-08T08:17:09Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Accelerating Inexact HyperGradient Descent for Bilevel Optimization [84.00488779515206]
We present a method for solving general non-strongly-concave bilevel optimization problems.
Our results also improve upon the existing complexity for finding second-order stationary points in non-strongly-concave problems.
arXiv Detail & Related papers (2023-06-30T20:36:44Z) - A Generalized Alternating Method for Bilevel Learning under the
Polyak-{\L}ojasiewicz Condition [63.66516306205932]
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields.
Recent results have shown that simple alternating iteration-based iterations can match interest owing to convex lower-level objective.
arXiv Detail & Related papers (2023-06-04T17:54:11Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis [18.08351275534193]
Bilevel optimization reveals the inner structure of otherwise oblique optimization problems.
A common goal in bilevel optimization is to a hyper-objective that implicitly depends on the solution of the set of parameters.
arXiv Detail & Related papers (2023-01-02T15:09:12Z) - A Conditional Gradient-based Method for Simple Bilevel Optimization with
Convex Lower-level Problem [18.15207779559351]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem via a cutting plane.
Our method achieves best-known assumption for the considered class of bilevel problems.
arXiv Detail & Related papers (2022-06-17T16:12:47Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.