Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning
- URL: http://arxiv.org/abs/2112.11663v1
- Date: Wed, 22 Dec 2021 04:33:27 GMT
- Title: Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning
- Authors: Ziyi Chen, Shaocong Ma, Yi Zhou
- Abstract summary: An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
- Score: 12.069630105460766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm
that has been widely used for model training in various machine learning
applications, which aim to solve a nonconvex minimax optimization problem.
However, the existing studies show that it suffers from a high computation
complexity in nonconvex minimax optimization. In this paper, we develop a
single-loop and fast AltGDA-type algorithm that leverages proximal gradient
updates and momentum acceleration to solve regularized nonconvex minimax
optimization problems. By identifying the intrinsic Lyapunov function of this
algorithm, we prove that it converges to a critical point of the nonconvex
minimax optimization problem and achieves a computation complexity
$\mathcal{O}(\kappa^{1.5}\epsilon^{-2})$, where $\epsilon$ is the desired level
of accuracy and $\kappa$ is the problem's condition number. Such a computation
complexity improves the state-of-the-art complexities of single-loop GDA and
AltGDA algorithms (see the summary of comparison in Table 1). We demonstrate
the effectiveness of our algorithm via an experiment on adversarial deep
learning.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Accelerated Gradient Algorithms with Adaptive Subspace Search for
Instance-Faster Optimization [6.896308995955336]
Gradient-based minimax optimal algorithms have promoted the development of continuous optimization and machine learning.
In this paper, we open up a new way to design and analyze gradient-based algorithms with direct applications in machine learning.
arXiv Detail & Related papers (2023-12-06T01:16:10Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Train simultaneously, generalize better: Stability of gradient-based
minimax learners [12.691047660244331]
We show a key role in the performance of the trained minimax model under both convex concave and non-concave minimax settings.
We discuss several numerical results indicating the role of optimization algorithms in the generalization of learned minimax models.
arXiv Detail & Related papers (2020-10-23T17:44:43Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved
Complexities [21.13934071954103]
We present a deterministic algorithm for non-in one-text variable Descent strongly-concave in the other.
We show that under the SGC assumption, the complexities of the algorithms match that of existing algorithms.
Results are presented in terms of oracle-texttZO-GDMSA and Numerical experiments are presented to support theoretical results.
arXiv Detail & Related papers (2020-01-22T00:05:14Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.