A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms
- URL: http://arxiv.org/abs/2201.13409v1
- Date: Mon, 31 Jan 2022 18:17:25 GMT
- Title: A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms
- Authors: Mathieu Dagr\'eou, Pierre Ablin, Samuel Vaiter, Thomas Moreau
- Abstract summary: Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
- Score: 17.12280360174073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bilevel optimization, the problem of minimizing a value function which
involves the arg-minimum of another function, appears in many areas of machine
learning. In a large scale setting where the number of samples is huge, it is
crucial to develop stochastic methods, which only use a few samples at a time
to progress. However, computing the gradient of the value function involves
solving a linear system, which makes it difficult to derive unbiased stochastic
estimates. To overcome this problem we introduce a novel framework, in which
the solution of the inner problem, the solution of the linear system, and the
main variable evolve at the same time. These directions are written as a sum,
making it straightforward to derive unbiased estimates. The simplicity of our
approach allows us to develop global variance reduction algorithms, where the
dynamics of all variables is subject to variance reduction. We demonstrate that
SABA, an adaptation of the celebrated SAGA algorithm in our framework, has
$O(\frac1T)$ convergence rate, and that it achieves linear convergence under
Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for
bilevel optimization that verifies either of these properties. Numerical
experiments validate the usefulness of our method.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Efficient Gradient Approximation Method for Constrained Bilevel
Optimization [2.0305676256390934]
Bilevel optimization has been developed with large-scale high-dimensional data.
This paper considers a constrained bilevel problem with convex and non-differentiable approximations.
arXiv Detail & Related papers (2023-02-03T19:34:56Z) - Continuation Newton methods with deflation techniques for global
optimization problems [3.705839280172101]
A global minimum point of an optimization problem is of interest in engineering.
In this article, we consider a new memetic algorithm for this nonlinear largescale problem.
According to our numerical experiments, new algorithm works well for unconstrained unconstrained problems.
arXiv Detail & Related papers (2021-07-29T09:53:49Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Aligning Partially Overlapping Point Sets: an Inner Approximation
Algorithm [80.15123031136564]
We propose a robust method to align point sets where there is no prior information about the value of the transformation.
Our algorithm does not need regularization on transformation, and thus can handle the situation where there is no prior information about the values of the transformations.
Experimental results demonstrate the better robustness of the proposed method over state-of-the-art algorithms.
arXiv Detail & Related papers (2020-07-05T15:23:33Z) - Hybrid Variance-Reduced SGD Algorithms For Nonconvex-Concave Minimax
Problems [26.24895953952318]
We develop an algorithm to solve a class of non-gence minimax problems.
They can also work with both single or two mini-batch derivatives.
arXiv Detail & Related papers (2020-06-27T03:05:18Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Optimizing generalization on the train set: a novel gradient-based
framework to train parameters and hyperparameters simultaneously [0.0]
Generalization is a central problem in Machine Learning.
We present a novel approach based on a new measure of risk that allows us to develop novel fully automatic procedures for generalization.
arXiv Detail & Related papers (2020-06-11T18:04:36Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.