On Implicit Bias in Overparameterized Bilevel Optimization
- URL: http://arxiv.org/abs/2212.14032v1
- Date: Wed, 28 Dec 2022 18:57:46 GMT
- Title: On Implicit Bias in Overparameterized Bilevel Optimization
- Authors: Paul Vicol, Jonathan Lorraine, Fabian Pedregosa, David Duvenaud, Roger
Grosse
- Abstract summary: Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively.
We investigate the implicit bias of gradient-based algorithms for bilevel optimization.
We show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective.
- Score: 38.11483853830913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many problems in machine learning involve bilevel optimization (BLO),
including hyperparameter optimization, meta-learning, and dataset distillation.
Bilevel problems consist of two nested sub-problems, called the outer and inner
problems, respectively. In practice, often at least one of these sub-problems
is overparameterized. In this case, there are many ways to choose among optima
that achieve equivalent objective values. Inspired by recent studies of the
implicit bias induced by optimization algorithms in single-level optimization,
we investigate the implicit bias of gradient-based algorithms for bilevel
optimization. We delineate two standard BLO methods -- cold-start and
warm-start -- and show that the converged solution or long-run behavior depends
to a large degree on these and other algorithmic choices, such as the
hypergradient approximation. We also show that the inner solutions obtained by
warm-start BLO can encode a surprising amount of information about the outer
objective, even when the outer parameters are low-dimensional. We believe that
implicit bias deserves as central a role in the study of bilevel optimization
as it has attained in the study of single-level neural net optimization.
Related papers
- Provably Faster Algorithms for Bilevel Optimization via Without-Replacement Sampling [96.47086913559289]
gradient-based algorithms are widely used in bilevel optimization.
We introduce a without-replacement sampling based algorithm which achieves a faster convergence rate.
We validate our algorithms over both synthetic and real-world applications.
arXiv Detail & Related papers (2024-11-07T17:05:31Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - A Single-Loop Algorithm for Decentralized Bilevel Optimization [11.67135350286933]
We propose a novel single-loop algorithm for solving decentralized bilevel optimization with a strongly convex lower-level problem.
Our approach is a fully single-loop method that approximates the hypergradient using only two matrix-vector multiplications per iteration.
Our analysis demonstrates that the proposed algorithm achieves the best-known convergence rate for bilevel optimization algorithms.
arXiv Detail & Related papers (2023-11-15T13:29:49Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Non-Convex Bilevel Optimization with Time-Varying Objective Functions [57.299128109226025]
We propose an online bilevel optimization where the functions can be time-varying and the agent continuously updates the decisions with online data.
Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions.
We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions.
arXiv Detail & Related papers (2023-08-07T06:27:57Z) - Efficient Gradient Approximation Method for Constrained Bilevel
Optimization [2.0305676256390934]
Bilevel optimization has been developed with large-scale high-dimensional data.
This paper considers a constrained bilevel problem with convex and non-differentiable approximations.
arXiv Detail & Related papers (2023-02-03T19:34:56Z) - A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization
Method [0.0]
We propose a gradient-based bilevel method for solving the hyperparameter optimization problem.
We show that the proposed method converges with lower computation and leads to models that generalize better on the testing set.
arXiv Detail & Related papers (2022-08-25T14:25:16Z) - A Constrained Optimization Approach to Bilevel Optimization with
Multiple Inner Minima [49.320758794766185]
We propose a new approach, which convert the bilevel problem to an equivalent constrained optimization, and then the primal-dual algorithm can be used to solve the problem.
Such an approach enjoys a few advantages including (a) addresses the multiple inner minima challenge; (b) fully first-order efficiency without Jacobian computations.
arXiv Detail & Related papers (2022-03-01T18:20:01Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.