Bilevel Optimization under Unbounded Smoothness: A New Algorithm and
Convergence Analysis
- URL: http://arxiv.org/abs/2401.09587v1
- Date: Wed, 17 Jan 2024 20:28:15 GMT
- Title: Bilevel Optimization under Unbounded Smoothness: A New Algorithm and
Convergence Analysis
- Authors: Jie Hao, Xiaochuan Gong, Mingrui Liu
- Abstract summary: Current bilevel optimization algorithms assume that the iterations of the upper-level function is unbounded smooth.
Recent studies reveal that certain neural networks exhibit such unbounded smoothness.
- Score: 17.596465452814883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilevel optimization is an important formulation for many machine learning
problems. Current bilevel optimization algorithms assume that the gradient of
the upper-level function is Lipschitz. However, recent studies reveal that
certain neural networks such as recurrent neural networks (RNNs) and
long-short-term memory networks (LSTMs) exhibit potential unbounded smoothness,
rendering conventional bilevel optimization algorithms unsuitable. In this
paper, we design a new bilevel optimization algorithm, namely BO-REP, to
address this challenge. This algorithm updates the upper-level variable using
normalized momentum and incorporates two novel techniques for updating the
lower-level variable: \textit{initialization refinement} and \textit{periodic
updates}. Specifically, once the upper-level variable is initialized, a
subroutine is invoked to obtain a refined estimate of the corresponding optimal
lower-level variable, and the lower-level variable is updated only after every
specific period instead of each iteration. When the upper-level problem is
nonconvex and unbounded smooth, and the lower-level problem is strongly convex,
we prove that our algorithm requires $\widetilde{\mathcal{O}}(1/\epsilon^4)$
iterations to find an $\epsilon$-stationary point in the stochastic setting,
where each iteration involves calling a stochastic gradient or Hessian-vector
product oracle. Notably, this result matches the state-of-the-art complexity
results under the bounded smoothness setting and without mean-squared
smoothness of the stochastic gradient, up to logarithmic factors. Our proof
relies on novel technical lemmas for the periodically updated lower-level
variable, which are of independent interest. Our experiments on
hyper-representation learning, hyperparameter optimization, and data
hyper-cleaning for text classification tasks demonstrate the effectiveness of
our proposed algorithm.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed
Smoothness Conditions [9.518010235273785]
We present a novel fully Liploop Hessian-inversion-free algorithmic framework for bilevel optimization.
We show that by a slight modification of our approach our approach can handle a more general multi-objective robust bilevel optimization problem.
arXiv Detail & Related papers (2023-06-21T07:32:29Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization [26.68351521813062]
Existing bi-level algorithms cannot handle nonsmooth or hyper-smooth regularizers.
In this paper, we show that an implicit differentiation (AID) scheme can be used to speed up comprehensive machine learning applications.
arXiv Detail & Related papers (2022-03-30T18:53:04Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Enhanced Bilevel Optimization via Bregman Distance [104.96004056928474]
We propose a bilevel optimization method based on Bregman Bregman functions.
We also propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique.
arXiv Detail & Related papers (2021-07-26T16:18:43Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.