Second-Order Sensitivity Analysis for Bilevel Optimization
- URL: http://arxiv.org/abs/2205.02329v1
- Date: Wed, 4 May 2022 21:16:05 GMT
- Title: Second-Order Sensitivity Analysis for Bilevel Optimization
- Authors: Robert Dyro, Edward Schmerling, Nikos Arechiga, Marco Pavone
- Abstract summary: We extend sensitivity analysis to provide second-order derivative information of the lower problem.
We show that much of the computation already used to produce the IFT gradient can be reused for the IFT Hessian.
- Score: 26.17470185675129
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we derive a second-order approach to bilevel optimization, a
type of mathematical programming in which the solution to a parameterized
optimization problem (the "lower" problem) is itself to be optimized (in the
"upper" problem) as a function of the parameters. Many existing approaches to
bilevel optimization employ first-order sensitivity analysis, based on the
implicit function theorem (IFT), for the lower problem to derive a gradient of
the lower problem solution with respect to its parameters; this IFT gradient is
then used in a first-order optimization method for the upper problem. This
paper extends this sensitivity analysis to provide second-order derivative
information of the lower problem (which we call the IFT Hessian), enabling the
usage of faster-converging second-order optimization methods at the upper
level. Our analysis shows that (i) much of the computation already used to
produce the IFT gradient can be reused for the IFT Hessian, (ii) errors bounds
derived for the IFT gradient readily apply to the IFT Hessian, (iii) computing
IFT Hessians can significantly reduce overall computation by extracting more
information from each lower level solve. We corroborate our findings and
demonstrate the broad range of applications of our method by applying it to
problem instances of least squares hyperparameter auto-tuning, multi-class SVM
auto-tuning, and inverse optimal control.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - Enhancing Hypergradients Estimation: A Study of Preconditioning and
Reparameterization [49.73341101297818]
Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem.
The conventional method to compute the so-called hypergradient of the outer problem is to use the Implicit Function Theorem (IFT)
We study the error of the IFT method and analyze two strategies to reduce this error.
arXiv Detail & Related papers (2024-02-26T17:09:18Z) - Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed
Smoothness Conditions [9.518010235273785]
We present a novel fully Liploop Hessian-inversion-free algorithmic framework for bilevel optimization.
We show that by a slight modification of our approach our approach can handle a more general multi-objective robust bilevel optimization problem.
arXiv Detail & Related papers (2023-06-21T07:32:29Z) - Efficient Gradient Approximation Method for Constrained Bilevel
Optimization [2.0305676256390934]
Bilevel optimization has been developed with large-scale high-dimensional data.
This paper considers a constrained bilevel problem with convex and non-differentiable approximations.
arXiv Detail & Related papers (2023-02-03T19:34:56Z) - On Implicit Bias in Overparameterized Bilevel Optimization [38.11483853830913]
Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively.
We investigate the implicit bias of gradient-based algorithms for bilevel optimization.
We show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective.
arXiv Detail & Related papers (2022-12-28T18:57:46Z) - Versatile Single-Loop Method for Gradient Estimator: First and Second
Order Optimality, and its Application to Federated Learning [45.78238792836363]
We present a single-loop algorithm named SLEDGE (Single-Loop-E Gradient Estimator) for periodic convergence.
Unlike existing methods, SLEDGE has the advantage of versatility; (ii) second-order optimal, (ii) in the PL region, and (iii) smaller complexity under less of data.
arXiv Detail & Related papers (2022-09-01T11:05:26Z) - A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization [26.68351521813062]
Existing bi-level algorithms cannot handle nonsmooth or hyper-smooth regularizers.
In this paper, we show that an implicit differentiation (AID) scheme can be used to speed up comprehensive machine learning applications.
arXiv Detail & Related papers (2022-03-30T18:53:04Z) - Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator [75.05106948314956]
We show that an increasing large momentum parameter for the first-order moment is sufficient for adaptive scaling.
We also give insights for increasing the momentum in a stagewise manner in accordance with stagewise decreasing step size.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.