Solve Minimax Optimization by Anderson Acceleration
- URL: http://arxiv.org/abs/2110.02457v1
- Date: Wed, 6 Oct 2021 02:08:54 GMT
- Title: Solve Minimax Optimization by Anderson Acceleration
- Authors: Huan He, Shifan Zhao, Yuanzhe Xi, Joyce C Ho, Yousef Saad
- Abstract summary: Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity.
We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration.
We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions.
- Score: 5.900781483345349
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many modern machine learning algorithms such as generative adversarial
networks (GANs) and adversarial training can be formulated as minimax
optimization. Gradient descent ascent (GDA) is the most commonly used algorithm
due to its simplicity. However, GDA can converge to non-optimal minimax points.
We propose a new minimax optimization framework, GDA-AM, that views the
GDAdynamics as a fixed-point iteration and solves it using Anderson Mixing to
con-verge to the local minimax. It addresses the diverging issue of
simultaneous GDAand accelerates the convergence of alternating GDA. We show
theoretically that the algorithm can achieve global convergence for bilinear
problems under mild conditions. We also empirically show that GDA-AMsolves a
variety of minimax problems and improves GAN training on several datasets
Related papers
- Fundamental Benefit of Alternating Updates in Minimax Optimization [15.170534286366744]
The Gradient Descent-Ascent gradients (GDA) algorithm is designed to solve minimax optimization problems.
While Alt-GDA is commonly observed to converge iteration faster, the performance gap between the two is not yet well understood.
We present fine-grained convergence analyses of both algorithms for strongly-strongly-concave and Lipschitz-gradient objectives.
arXiv Detail & Related papers (2024-02-16T06:41:35Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Faster Single-loop Algorithms for Minimax Optimization without Strong
Concavity [30.97847679999505]
Gradient descent ascent (GDA) is the simplest single-loop algorithm for nonstationary minimax optimization.
Recent work shows inferior convergence rates of GDA in adversarial networks.
Two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- are presented.
arXiv Detail & Related papers (2021-12-10T15:35:42Z) - Minimax Optimization: The Case of Convex-Submodular [50.03984152441271]
Minimax problems extend beyond the continuous domain to mixed continuous-discrete domains or even fully discrete domains.
We introduce the class of convex-submodular minimax problems, where the objective is convex with respect to the continuous variable and submodular with respect to the discrete variable.
Our proposed algorithms are iterative and combine tools from both discrete and continuous optimization.
arXiv Detail & Related papers (2021-11-01T21:06:35Z) - Don't Fix What ain't Broke: Near-optimal Local Convergence of
Alternating Gradient Descent-Ascent for Minimax Optimization [24.236441934607154]
alternating descentascent (Alt-GDA) is superior to its simultaneous counterpart (Sim-GDA) in many settings.
We show that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
In particular, we prove that Alt-GDA achieves a near-optimal local convergence rate for strongly-concave problems.
arXiv Detail & Related papers (2021-02-18T16:39:35Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - Train simultaneously, generalize better: Stability of gradient-based
minimax learners [12.691047660244331]
We show a key role in the performance of the trained minimax model under both convex concave and non-concave minimax settings.
We discuss several numerical results indicating the role of optimization algorithms in the generalization of learned minimax models.
arXiv Detail & Related papers (2020-10-23T17:44:43Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Learning to Optimize Non-Rigid Tracking [54.94145312763044]
We employ learnable optimizations to improve robustness and speed up solver convergence.
First, we upgrade the tracking objective by integrating an alignment data term on deep features which are learned end-to-end through CNN.
Second, we bridge the gap between the preconditioning technique and learning method by introducing a ConditionNet which is trained to generate a preconditioner.
arXiv Detail & Related papers (2020-03-27T04:40:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.