A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization
- URL: http://arxiv.org/abs/2203.16615v1
- Date: Wed, 30 Mar 2022 18:53:04 GMT
- Title: A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and
Nonsmooth Bi-level Optimization
- Authors: Ziyi Chen, Bhavya Kailkhura, Yi Zhou
- Abstract summary: Existing bi-level algorithms cannot handle nonsmooth or hyper-smooth regularizers.
In this paper, we show that an implicit differentiation (AID) scheme can be used to speed up comprehensive machine learning applications.
- Score: 26.68351521813062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many important machine learning applications involve regularized nonconvex
bi-level optimization. However, the existing gradient-based bi-level
optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and
they suffer from a high computation complexity in nonconvex bi-level
optimization. In this work, we study a proximal gradient-type algorithm that
adopts the approximate implicit differentiation (AID) scheme for nonconvex
bi-level optimization with possibly nonconvex and nonsmooth regularizers. In
particular, the algorithm applies the Nesterov's momentum to accelerate the
computation of the implicit gradient involved in AID. We provide a
comprehensive analysis of the global convergence properties of this algorithm
through identifying its intrinsic potential function. In particular, we
formally establish the convergence of the model parameters to a critical point
of the bi-level problem, and obtain an improved computation complexity
$\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result.
Moreover, we analyze the asymptotic convergence rates of this algorithm under a
class of local nonconvex geometries characterized by a {\L}ojasiewicz-type
gradient inequality. Experiment on hyper-parameter optimization demonstrates
the effectiveness of our algorithm.
Related papers
- Efficient Gradient Approximation Method for Constrained Bilevel
Optimization [2.0305676256390934]
Bilevel optimization has been developed with large-scale high-dimensional data.
This paper considers a constrained bilevel problem with convex and non-differentiable approximations.
arXiv Detail & Related papers (2023-02-03T19:34:56Z) - Efficiently Escaping Saddle Points in Bilevel Optimization [48.925688192913]
Bilevel optimization is one of the problems in machine learning.
Recent developments in bilevel optimization converge on the first fundamental nonaptature multi-step analysis.
arXiv Detail & Related papers (2022-02-08T07:10:06Z) - Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Bilevel Optimization for Machine Learning: Algorithm Design and
Convergence Analysis [12.680169619392695]
This thesis provides a comprehensive convergence rate analysis for bilevel optimization algorithms.
For the problem-based formulation, we provide a convergence rate analysis for AID- and ITD-based bilevel algorithms.
We then develop acceleration bilevel algorithms, for which we provide shaper convergence analysis with relaxed assumptions.
arXiv Detail & Related papers (2021-07-31T22:05:47Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Bilevel Optimization: Convergence Analysis and Enhanced Design [63.64636047748605]
Bilevel optimization is a tool for many machine learning problems.
We propose a novel stoc-efficientgradient estimator named stoc-BiO.
arXiv Detail & Related papers (2020-10-15T18:09:48Z) - Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart
for Nonconvex Optimization [73.38702974136102]
Various types of parameter restart schemes have been proposed for accelerated algorithms to facilitate their practical convergence in rates.
In this paper, we propose an algorithm for solving nonsmooth problems.
arXiv Detail & Related papers (2020-02-26T16:06:27Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.