Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave
Minimax Optimization
- URL: http://arxiv.org/abs/2212.12978v5
- Date: Mon, 30 Oct 2023 08:56:50 GMT
- Title: Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave
Minimax Optimization
- Authors: Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet,
Jiajin Li
- Abstract summary: Non-nonconcave minimax optimization has received intense attention the last decade due to its broad applications in machine learning.
We propose a novel applicable single-loop, dual algorithm, which is able to balance uniformly and doubly the gradient.
Specifically, under the one-sided KL condition with exponent $thetain(0,1)$, DS-GDA converges with an $mathcalO(eps-2$2$1) results.
- Score: 22.392563619845212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nonconvex-nonconcave minimax optimization has received intense attention over
the last decade due to its broad applications in machine learning. Most
existing algorithms rely on one-sided information, such as the convexity (resp.
concavity) of the primal (resp. dual) functions, or other specific structures,
such as the Polyak-\L{}ojasiewicz (P\L{}) and Kurdyka-\L{}ojasiewicz (K\L{})
conditions. However, verifying these regularity conditions is challenging in
practice. To meet this challenge, we propose a novel universally applicable
single-loop algorithm, the doubly smoothed gradient descent ascent method
(DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA
with the same hyperparameters is able to uniformly solve nonconvex-concave,
convex-nonconcave, and nonconvex-nonconcave problems with one-sided K\L{}
properties, achieving convergence with $\mathcal{O}(\epsilon^{-4})$ complexity.
Sharper (even optimal) iteration complexity can be obtained when the K\L{}
exponent is known. Specifically, under the one-sided K\L{} condition with
exponent $\theta\in(0,1)$, DS-GDA converges with an iteration complexity of
$\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$. They all match the corresponding
best results in the literature. Moreover, we show that DS-GDA is practically
applicable to general nonconvex-nonconcave problems even without any regularity
conditions, such as the P\L{} condition, K\L{} condition, or weak Minty
variational inequalities condition. For various challenging
nonconvex-nonconcave examples in the literature, including ``Forsaken'',
``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'',
the proposed DS-GDA can all get rid of limit cycles. To the best of our
knowledge, this is the first first-order algorithm to achieve convergence on
all of these formidable problems.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization [77.3396841985172]
We provide a unified analysis of two-timescale gradient ascent (TTGDA) for solving structured non minimax optimization problems.
Our contribution is to design TTGDA algorithms are effective beyond the setting.
arXiv Detail & Related papers (2024-08-21T20:14:54Z) - Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for
Nonconvex Minimax Problems with Coupled linear Constraints [3.898056433020865]
We propose two zero-order regular complexity algorithms for non minimax problems with linear constraints.
To the best of our knowledge, they are first two zero-order algorithms with best for noncal complexity.
arXiv Detail & Related papers (2024-01-26T11:22:13Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Primal Dual Alternating Proximal Gradient Algorithms for Nonsmooth Nonconvex Minimax Problems with Coupled Linear Constraints [4.70696854954921]
Non proximal minimax problems have attracted wide attention in machine learning, signal processing many other fields in recent years.
We propose DAP algorithm for solving nonsmooth non-strongly concave minimax problems.
arXiv Detail & Related papers (2022-12-09T05:32:30Z) - The Complexity of Nonconvex-Strongly-Concave Minimax Optimization [43.07732143522183]
This paper establishes the complexity for finding approximate stationary points of non-strongly-concave (NC-SC) smooth minimax problems.
We deploy a proposed sequence of $Omega-strong$lyconcave sub-2 problems in both general complexity and averaged complexity.
In our proposed finite-sum setting, our proposed algorithm provides a nearly-tight dependence on the condition number.
arXiv Detail & Related papers (2021-03-29T18:53:57Z) - Proximal Gradient Descent-Ascent: Variable Convergence under K{\L}
Geometry [49.65455534654459]
The finite descent-ascent parameters (GDA) has been widely applied to solve minimax optimization problems.
This paper fills such a gap by studying the convergence of the KL-Lized geometry.
arXiv Detail & Related papers (2021-02-09T05:35:53Z) - A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for
Nonconvex-Concave Min-Max Problems [33.83671897619922]
Non-con-max problem arises in many applications including minimizing a pointwise set of non functions to solve this robust problem.
A.A. algorithm can achieve an $(/A-O-) of $(/A-O-)$ for a finite collection of non functions.
arXiv Detail & Related papers (2020-10-29T17:13:46Z) - Hybrid Stochastic-Deterministic Minibatch Proximal Gradient:
Less-Than-Single-Pass Optimization with Nearly Optimal Generalization [83.80460802169999]
We show that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
For loss factors, we prove that HSDMPG can attain an $mathcalObig (1/sttnbig)$ which is at the order of excess error on a learning model.
arXiv Detail & Related papers (2020-09-18T02:18:44Z) - A Unified Single-loop Alternating Gradient Projection Algorithm for
Nonconvex-Concave and Convex-Nonconcave Minimax Problems [8.797831153231664]
We develop an efficient algorithm for solving minimax problems with theoretical general convexnon objective guarantees.
We show that the proposed algorithm can be used to solve both noncaveepsilon concave (strongly) and (strongly) convexnonconcave minimax problems.
arXiv Detail & Related papers (2020-06-03T04:00:52Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.