On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
- URL: http://arxiv.org/abs/1906.00331v9
- Date: Thu, 14 Sep 2023 04:59:49 GMT
- Title: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
- Authors: Tianyi Lin, Chi Jin, Michael I. Jordan
- Abstract summary: We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
- Score: 86.92205445270427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}}
\max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is
nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a
convex and bounded set. One of the most popular algorithms for solving this
problem is the celebrated gradient descent ascent (GDA) algorithm, which has
been widely used in machine learning, control theory and economics. Despite the
extensive convergence results for the convex-concave setting, GDA with equal
stepsize can converge to limit cycles or even diverge in a general setting. In
this paper, we present the complexity results on two-time-scale GDA for solving
nonconvex-concave minimax problems, showing that the algorithm can find a
stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in
\mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this
is the first nonasymptotic analysis for two-time-scale GDA in this setting,
shedding light on its superior practical performance in training generative
adversarial networks (GANs) and other real applications.
Related papers
- Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts.
A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z) - On Convergence of Gradient Descent Ascent: A Tight Local Analysis [30.206431787797776]
Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs)
We prove the convergence of GDA with a stepsize ratioeta_mathbfx$/eta_mathbfx=Theta(kappa2)
We further extend the convergence guarantees to GDA and extra-gradient methods (EG)
arXiv Detail & Related papers (2022-07-03T05:04:46Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Near-Optimal Algorithms for Minimax Optimization [115.21519161773287]
The paper presents the first with $tildeO(sqrtkappa_mathbf xkappa_mathbf)$, matching the design on logarithmic factors.
The paper also presents algorithms that match or outperform all existing methods in these settings in terms of complexity.
arXiv Detail & Related papers (2020-02-05T16:49:09Z) - Stochastic Recursive Gradient Descent Ascent for Stochastic
Nonconvex-Strongly-Concave Minimax Problems [36.645753881826955]
In this paper, we propose a novel method called RecurEnti Ascent (SREDA), which estimates more efficiently using variance.
This method achieves the best known for this problem.
arXiv Detail & Related papers (2020-01-11T09:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.