Bilevel Optimization: Convergence Analysis and Enhanced Design
- URL: http://arxiv.org/abs/2010.07962v3
- Date: Fri, 27 Aug 2021 17:32:20 GMT
- Title: Bilevel Optimization: Convergence Analysis and Enhanced Design
- Authors: Kaiyi Ji, Junjie Yang and Yingbin Liang
- Abstract summary: Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
- Score: 63.64636047748605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization has arisen as a powerful tool for many machine learning
problems such as meta-learning, hyperparameter optimization, and reinforcement
learning. In this paper, we investigate the nonconvex-strongly-convex bilevel
optimization problem. For deterministic bilevel optimization, we provide a
comprehensive convergence rate analysis for two popular algorithms respectively
based on approximate implicit differentiation (AID) and iterative
differentiation (ITD). For the AID-based method, we orderwisely improve the
previous convergence rate analysis due to a more practical parameter selection
as well as a warm start strategy, and for the ITD-based method we establish the
first theoretical convergence rate. Our analysis also provides a quantitative
comparison between ITD and AID based approaches. For stochastic bilevel
optimization, we propose a novel algorithm named stocBiO, which features a
sample-efficient hypergradient estimator using efficient Jacobian- and
Hessian-vector product computations. We provide the convergence rate guarantee
for stocBiO, and show that stocBiO outperforms the best known computational
complexities orderwisely with respect to the condition number $\kappa$ and the
target accuracy $\epsilon$. We further validate our theoretical results and
demonstrate the efficiency of bilevel optimization algorithms by the
experiments on meta-learning and hyperparameter optimization.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - A Single-Loop Algorithm for Decentralized Bilevel Optimization [11.67135350286933]
We propose a novel single-loop algorithm for solving decentralized bilevel optimization with a strongly convex lower-level problem.
Our approach is a fully single-loop method that approximates the hypergradient using only two matrix-vector multiplications per iteration.
Our analysis demonstrates that the proposed algorithm achieves the best-known convergence rate for bilevel optimization algorithms.
arXiv Detail & Related papers (2023-11-15T13:29:49Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Decentralized Multi-Level Compositional Optimization Algorithms with Level-Independent Convergence Rate [26.676582181833584]
Decentralized multi-level optimization is challenging because of the multilevel structure and decentralized communication.
We develop two novel decentralized optimization algorithms to optimize the multi-level compositional problem.
arXiv Detail & Related papers (2023-06-06T00:23:28Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Bilevel Optimization for Machine Learning: Algorithm Design and
Convergence Analysis [12.680169619392695]
This thesis provides a comprehensive convergence rate analysis for bilevel optimization algorithms.
For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms.
We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions.
arXiv Detail & Related papers (2021-07-31T22:05:47Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.