Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence
- URL: http://arxiv.org/abs/2006.09361v3
- Date: Mon, 22 Mar 2021 17:09:20 GMT
- Title: Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence
- Authors: Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor
- Abstract summary: In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
- Score: 120.9336529957224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many important machine learning applications amount to solving minimax
optimization problems, and in many cases there is no access to the gradient
information, but only the function values. In this paper, we focus on such a
gradient-free setting, and consider the nonconvex-strongly-concave minimax
stochastic optimization problem. In the literature, various zeroth-order (i.e.,
gradient-free) minimax methods have been proposed, but none of them achieve the
potentially feasible computational complexity of $\mathcal{O}(\epsilon^{-3})$
suggested by the stochastic nonconvex minimization theorem. In this paper, we
adopt the variance reduction technique to design a novel zeroth-order variance
reduced gradient descent ascent (ZO-VRGDA) algorithm. We show that the ZO-VRGDA
algorithm achieves the best known query complexity of $\mathcal{O}(\kappa(d_1 +
d_2)\epsilon^{-3})$, which outperforms all previous complexity bound by orders
of magnitude, where $d_1$ and $d_2$ denote the dimensions of the optimization
variables and $\kappa$ denotes the condition number. In particular, with a new
analysis technique that we develop, our result does not rely on a diminishing
or accuracy-dependent stepsize usually required in the existing methods. To our
best knowledge, this is the first study of zeroth-order minimax optimization
with variance reduction. Experimental results on the black-box distributional
robust optimization problem demonstrates the advantageous performance of our
new algorithm.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum [30.01198677588252]
First-order algorithms require at least $mathcalO(varepsilonepsilon-4)$ complexity to find an $varepsilon-stationary point.
We introduce novel momentum algorithms utilizing efficient variable complexity.
The effectiveness of the method is validated through robust logistic regression using real-world datasets.
arXiv Detail & Related papers (2024-06-18T20:14:52Z) - Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization [6.896308995955336]
Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning.
In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
arXiv Detail & Related papers (2023-12-06T01:16:10Z) - Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z) - Derivative-free Alternating Projection Algorithms for General
Nonconvex-Concave Minimax Problems [9.173866646584031]
In this paper, we propose an algorithm for nonsmooth zeroth-order minimax problems.
We show that it can be used to attack nonconcave minimax problems.
arXiv Detail & Related papers (2021-08-01T15:23:49Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems.
We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$.
We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.