Related papers: Gradient-based Bi-level Optimization for Deep Learning: A Survey

Gradient-based Bi-level Optimization for Deep Learning: A Survey

URL: http://arxiv.org/abs/2207.11719v4
Date: Sun, 9 Jul 2023 21:53:45 GMT
Title: Gradient-based Bi-level Optimization for Deep Learning: A Survey
Authors: Can Chen, Xi Chen, Chen Ma, Zixuan Liu, Xue Liu
Abstract summary: Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community. We first give a formal definition of the gradient-based bi-level optimization. We then discuss four bi-level optimization solvers to update the outer variable.
Score: 14.39891675968109
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.

Related papers

Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization [10.939142192058004]
Bilevel optimization is a powerful tool for many machine learning problems.<n>We propose a technique for incorporating curvature information into the approximation of hypergradients.<n>We present a novel algorithmic framework based on the resulting enhanced hypergradient.
arXiv Detail & Related papers (2025-05-04T13:13:29Z)
Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization. We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate. We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z)
Pareto Set Prediction Assisted Bilevel Multi-objective Optimization [2.3293561091456283]
We address problems with multiple objectives (BLMOP) at both levels. The proposed approach is competitive across a range of problems, including both deceptive and non-deceptive.
arXiv Detail & Related papers (2024-09-05T08:04:11Z)
Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization [71.35604981129838]
Traditional gradient-based bi-level optimization algorithms are ill-suited to meet the demands of large-scale applications. We introduce $(textFG)2textU$, which achieves an unbiased approximation of the meta gradient for bi-level optimization. $(textFG)2textU$ is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems.
arXiv Detail & Related papers (2024-06-20T08:21:52Z)
A survey and benchmark of high-dimensional Bayesian optimization of discrete sequences [12.248793682283964]
optimizing discrete black-box functions is key in several domains, e.g. protein engineering and drug design. We develop a unified framework to test a vast array of high-dimensional Bayesian optimization methods and a collection of standardized black-box functions. These two components of the benchmark are each supported by flexible, scalable, and easily extendable software libraries.
arXiv Detail & Related papers (2024-06-07T08:39:40Z)
Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable. It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z)
On Implicit Bias in Overparameterized Bilevel Optimization [38.11483853830913]
Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. We investigate the implicit bias of gradient-based algorithms for bilevel optimization. We show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective.
arXiv Detail & Related papers (2022-12-28T18:57:46Z)
Generalizing Bayesian Optimization with Decision-theoretic Entropies [102.82152945324381]
We consider a generalization of Shannon entropy from work in statistical decision theory. We first show that special cases of this entropy lead to popular acquisition functions used in BO procedures. We then show how alternative choices for the loss yield a flexible family of acquisition functions.
arXiv Detail & Related papers (2022-10-04T04:43:58Z)
Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions. We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z)
Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling. We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z)
Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems. We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z)
Finding Optimal Points for Expensive Functions Using Adaptive RBF-Based Surrogate Model Via Uncertainty Quantification [11.486221800371919]
We propose a novel global optimization framework using adaptive Radial Basis Functions (RBF) based surrogate model via uncertainty quantification. It first employs an RBF-based Bayesian surrogate model to approximate the true function, where the parameters of the RBFs can be adaptively estimated and updated each time a new point is explored. It then utilizes a model-guided selection criterion to identify a new point from a candidate set for function evaluation.
arXiv Detail & Related papers (2020-01-19T16:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.