Gradient-based Bi-level Optimization for Deep Learning: A Survey
- URL: http://arxiv.org/abs/2207.11719v4
- Date: Sun, 9 Jul 2023 21:53:45 GMT
- Title: Gradient-based Bi-level Optimization for Deep Learning: A Survey
- Authors: Can Chen, Xi Chen, Chen Ma, Zixuan Liu, Xue Liu
- Abstract summary: Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community.
We first give a formal definition of the gradient-based bi-level optimization.
We then discuss four bi-level optimization solvers to update the outer variable.
- Score: 14.39891675968109
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Bi-level optimization, especially the gradient-based category, has been
widely used in the deep learning community including hyperparameter
optimization and meta-knowledge extraction. Bi-level optimization embeds one
problem within another and the gradient-based category solves the outer-level
task by computing the hypergradient, which is much more efficient than
classical methods such as the evolutionary algorithm. In this survey, we first
give a formal definition of the gradient-based bi-level optimization. Next, we
delineate criteria to determine if a research problem is apt for bi-level
optimization and provide a practical guide on structuring such problems into a
bi-level optimization framework, a feature particularly beneficial for those
new to this domain. More specifically, there are two formulations: the
single-task formulation to optimize hyperparameters such as regularization
parameters and the distilled data, and the multi-task formulation to extract
meta-knowledge such as the model initialization. With a bi-level formulation,
we then discuss four bi-level optimization solvers to update the outer variable
including explicit gradient update, proxy update, implicit function update, and
closed-form update. Finally, we wrap up the survey by highlighting two
prospective future directions: (1) Effective Data Optimization for Science
examined through the lens of task formulation. (2) Accurate Explicit Proxy
Update analyzed from an optimization standpoint.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Pareto Set Prediction Assisted Bilevel Multi-objective Optimization [2.3293561091456283]
We address problems with multiple objectives (BLMOP) at both levels.
The proposed approach is competitive across a range of problems, including both deceptive and non-deceptive.
arXiv Detail & Related papers (2024-09-05T08:04:11Z) - A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences [12.248793682283964]
optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design.
We develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions.
These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries.
arXiv Detail & Related papers (2024-06-07T08:39:40Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - On Implicit Bias in Overparameterized Bilevel Optimization [38.11483853830913]
Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively.
We investigate the implicit bias of gradient-based algorithms for bilevel optimization.
We show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective.
arXiv Detail & Related papers (2022-12-28T18:57:46Z) - Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory.
We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures.
We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based
Surrogate Model Via Uncertainty Quantification [11.486221800371919]
We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification.
It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored.
It then utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation.
arXiv Detail & Related papers (2020-01-19T16:15:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.