Related papers: Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

URL: http://arxiv.org/abs/2006.09361v3
Date: Mon, 22 Mar 2021 17:09:20 GMT
Title: Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence
Authors: Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor
Abstract summary: In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
Score: 120.9336529957224
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many important machine learning applications amount to solving minimax optimization problems, and in many cases there is no access to the gradient information, but only the function values. In this paper, we focus on such a gradient-free setting, and consider the nonconvex-strongly-concave minimax stochastic optimization problem. In the literature, various zeroth-order (i.e., gradient-free) minimax methods have been proposed, but none of them achieve the potentially feasible computational complexity of $\mathcal{O}(\epsilon^{-3})$ suggested by the stochastic nonconvex minimization theorem. In this paper, we adopt the variance reduction technique to design a novel zeroth-order variance reduced gradient descent ascent (ZO-VRGDA) algorithm. We show that the ZO-VRGDA algorithm achieves the best known query complexity of $\mathcal{O}(\kappa(d_1 + d_2)\epsilon^{-3})$, which outperforms all previous complexity bound by orders of magnitude, where $d_1$ and $d_2$ denote the dimensions of the optimization variables and $\kappa$ denotes the condition number. In particular, with a new analysis technique that we develop, our result does not rely on a diminishing or accuracy-dependent stepsize usually required in the existing methods. To our best knowledge, this is the first study of zeroth-order minimax optimization with variance reduction. Experimental results on the black-box distributional robust optimization problem demonstrates the advantageous performance of our new algorithm.

Related papers

Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems. We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z)
Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum [30.01198677588252]
First-order algorithms require at least $mathcalO(varepsilonepsilon-4)$ complexity to find an $varepsilon-stationary point. We introduce novel momentum algorithms utilizing efficient variable complexity. The effectiveness of the method is validated through robust logistic regression using real-world datasets.
arXiv Detail & Related papers (2024-06-18T20:14:52Z)
Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization [6.896308995955336]
Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning. In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
arXiv Detail & Related papers (2023-12-06T01:16:10Z)
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization [41.28002701420715]
Minimax optimization has been widely applied in many machine learning tasks. We show that our methods have the best known sample complexity without relying on any specific types.
arXiv Detail & Related papers (2023-03-07T15:33:12Z)
Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications. In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z)
Derivative-free Alternating Projection Algorithms for General Nonconvex-Concave Minimax Problems [9.173866646584031]
In this paper, we propose an algorithm for nonsmooth zeroth-order minimax problems. We show that it can be used to attack nonconcave minimax problems.
arXiv Detail & Related papers (2021-08-01T15:23:49Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems. We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$. We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.