Adaptive Mirror Descent Bilevel Optimization
- URL: http://arxiv.org/abs/2311.04520v2
- Date: Sat, 18 Nov 2023 15:12:13 GMT
- Title: Adaptive Mirror Descent Bilevel Optimization
- Authors: Feihu Huang
- Abstract summary: We propose a class of efficient adaptive bilevel methods based on mirror descent for non bifraclevel optimization.
We provide an analysis for methods under some conditions, and prove that our methods have a fast number of iterations.
- Score: 25.438298531555468
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In the paper, we propose a class of efficient adaptive bilevel methods based
on mirror descent for nonconvex bilevel optimization, where its upper-level
problem is nonconvex possibly with nonsmooth regularization, and its
lower-level problem is also nonconvex while satisfies Polyak-{\L}ojasiewicz
(PL) condition. To solve these deterministic bilevel problems, we present an
efficient adaptive projection-aid gradient (i.e., AdaPAG) method based on
mirror descent, and prove that it obtains the best known gradient complexity of
$O(\epsilon^{-1})$ for finding an $\epsilon$-stationary solution of nonconvex
bilevel problems. To solve these stochastic bilevel problems, we propose an
efficient adaptive stochastic projection-aid gradient (i.e., AdaVSPAG) methods
based on mirror descent and variance-reduced techniques, and prove that it
obtains the best known gradient complexity of $O(\epsilon^{-3/2})$ for finding
an $\epsilon$-stationary solution. Since the PL condition relaxes the strongly
convex, our algorithms can be used to nonconvex strongly-convex bilevel
optimization. Theoretically, we provide a useful convergence analysis framework
for our methods under some mild conditions, and prove that our methods have a
fast convergence rate of $O(\frac{1}{T})$, where $T$ denotes the number of
iterations.
Related papers
- Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization [25.438298531555468]
Bilevel optimization is widely applied in many machine learning tasks such as hyper learning, meta learning and reinforcement learning.
We propose an efficient Hessian/BiO method with the optimal convergence $frac1TT) under some mild conditions.
We conduct some some experiments on the bilevel game hyper-stationary numerical convergence.
arXiv Detail & Related papers (2024-07-25T07:25:06Z) - An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization [16.709026203727007]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem.
We measure the performance of our method in terms of suboptimality and infeasibility errors.
arXiv Detail & Related papers (2024-02-12T22:34:53Z) - A Generalized Alternating Method for Bilevel Learning under the
Polyak-{\L}ojasiewicz Condition [63.66516306205932]
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields.
Recent results have shown that simple alternating iteration-based iterations can match interest owing to convex lower-level objective.
arXiv Detail & Related papers (2023-06-04T17:54:11Z) - On Momentum-Based Gradient Methods for Bilevel Optimization with
Nonconvex Lower-Level [25.438298531555468]
Bilevel optimization is a popular process in machine learning tasks.
In this paper, we investigate the non-representation problem of bilevel PL game.
We show that our method improves the existing best results by a factor of $tO(Enabla F(x)leq epsilon$)
arXiv Detail & Related papers (2023-03-07T14:55:05Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - A Conditional Gradient-based Method for Simple Bilevel Optimization with
Convex Lower-level Problem [18.15207779559351]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem via a cutting plane.
Our method achieves best-known assumption for the considered class of bilevel problems.
arXiv Detail & Related papers (2022-06-17T16:12:47Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications.
We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.