Related papers: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems

URL: http://arxiv.org/abs/1906.00331v9
Date: Thu, 14 Sep 2023 04:59:49 GMT
Title: On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems
Authors: Tianyi Lin, Chi Jin, Michael I. Jordan
Abstract summary: We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
Score: 86.92205445270427
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.

Related papers

Solving Convex-Concave Problems with $\ ilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity [27.760400553380823]
We show an improved upper bound of $tildemathcalO(epsilon-4/7)$ by generalizing the optimal second-order method for convex optimization.<n>We further apply a similar technique to lazy Hessian algorithms and show that our proposed algorithm can also be seen as a second-order Catalyst'' framework.
arXiv Detail & Related papers (2025-06-10T02:20:48Z)
Learning a Single Neuron Robustly to Distributional Shifts and Adversarial Label Noise [38.551072383777594]
We study the problem of learning a single neuron with respect to the $L2$ loss in the presence of adversarial distribution shifts. A new algorithm is developed to approximate the vector vector squared loss with respect to the worst distribution that is in the $chi2$divergence to the $mathcalp_0$.
arXiv Detail & Related papers (2024-11-11T03:43:52Z)
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems. Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z)
No-Regret M${}^{\ atural}$-Concave Function Maximization: Stochastic Bandit Algorithms and NP-Hardness of Adversarial Full-Information Setting [23.188831772813103]
We study online M$natural$-concave function problems, which are interactive versions of the problem studied by Murota and Shioura (1999). For the bandit setting, we present $O(T-1/2)$-simple regret and $O(T2/3)$-regret algorithms under $T$ times access to noisy value oracles of M$natural$-concave functions. We prove that even with full-information feedback, no algorithms that run in time per round can achieve $O(T1-c)$ regret for any constant $c
arXiv Detail & Related papers (2024-05-21T01:31:44Z)
Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications. We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions. Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z)
Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. In this work, we initiate the study of provably learning a multi-head attention layer from random examples. We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z)
Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems [8.0153031008486]
We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings. We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
arXiv Detail & Related papers (2023-01-08T21:46:27Z)
On Convergence of Gradient Descent Ascent: A Tight Local Analysis [30.206431787797776]
Gradient Descent Ascent (GDA) methods are the mainstream algorithms for minimax optimization in generative adversarial networks (GANs) We prove the convergence of GDA with a stepsize ratioeta_mathbfx$/eta_mathbfx=Theta(kappa2) We further extend the convergence guarantees to GDA and extra-gradient methods (EG)
arXiv Detail & Related papers (2022-07-03T05:04:46Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Near-Optimal Algorithms for Minimax Optimization [115.21519161773287]
The paper presents the first with $tildeO(sqrtkappa_mathbf xkappa_mathbf)$, matching the design on logarithmic factors. The paper also presents algorithms that match or outperform all existing methods in these settings in terms of complexity.
arXiv Detail & Related papers (2020-02-05T16:49:09Z)
Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems [36.645753881826955]
In this paper, we propose a novel method called RecurEnti Ascent (SREDA), which estimates more efficiently using variance. This method achieves the best known for this problem.
arXiv Detail & Related papers (2020-01-11T09:05:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.