Bilevel Optimization with a Lower-level Contraction: Optimal Sample
Complexity without Warm-start
- URL: http://arxiv.org/abs/2202.03397v4
- Date: Thu, 16 Nov 2023 11:13:55 GMT
- Title: Bilevel Optimization with a Lower-level Contraction: Optimal Sample
Complexity without Warm-start
- Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo
- Abstract summary: We study bilevel problems in which the upper-level problem consists in the iterations of a smooth objective function and the lower-level problem is to find the fixed point of a smooth map.
Several recent works have proposed algorithms which warm-start the lower-level problem.
We show that without warm-start, it is still possible to achieve order-wise optimal sample complexity.
- Score: 39.13249645102326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We analyse a general class of bilevel problems, in which the upper-level
problem consists in the minimization of a smooth objective function and the
lower-level problem is to find the fixed point of a smooth contraction map.
This type of problems include instances of meta-learning, equilibrium models,
hyperparameter optimization and data poisoning adversarial attacks. Several
recent works have proposed algorithms which warm-start the lower-level problem,
i.e.~they use the previous lower-level approximate solution as a staring point
for the lower-level solver. This warm-start procedure allows one to improve the
sample complexity in both the stochastic and deterministic settings, achieving
in some cases the order-wise optimal sample complexity. However, there are
situations, e.g., meta learning and equilibrium models, in which the warm-start
procedure is not well-suited or ineffective. In this work we show that without
warm-start, it is still possible to achieve order-wise (near) optimal sample
complexity. In particular, we propose a simple method which uses (stochastic)
fixed point iterations at the lower-level and projected inexact gradient
descent at the upper-level, that reaches an $\epsilon$-stationary point using
$O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic
and the deterministic setting, respectively. Finally, compared to methods using
warm-start, our approach yields a simpler analysis that does not need to study
the coupled interactions between the upper-level and lower-level iterates.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization [16.709026203727007]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem.
We measure the performance of our method in terms of suboptimality and infeasibility errors.
arXiv Detail & Related papers (2024-02-12T22:34:53Z) - A Conditional Gradient-based Method for Simple Bilevel Optimization with
Convex Lower-level Problem [18.15207779559351]
We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem via a cutting plane.
Our method achieves best-known assumption for the considered class of bilevel problems.
arXiv Detail & Related papers (2022-06-17T16:12:47Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Randomized Stochastic Variance-Reduced Methods for Stochastic Bilevel
Optimization [62.87181271021217]
We consider non-SBO problems that have many applications in machine learning.
This paper proposes fast randomized algorithms for non-SBO problems.
arXiv Detail & Related papers (2021-05-05T18:28:42Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.