Faster Single-loop Algorithms for Minimax Optimization without Strong
Concavity
- URL: http://arxiv.org/abs/2112.05604v1
- Date: Fri, 10 Dec 2021 15:35:42 GMT
- Title: Faster Single-loop Algorithms for Minimax Optimization without Strong
Concavity
- Authors: Junchi Yang, Antonio Orvieto, Aurelien Lucchi and Niao He
- Abstract summary: Gradient descent ascent (GDA) is the simplest single-loop algorithm for nonstationary minimax optimization.
Recent work shows inferior convergence rates of GDA in adversarial networks.
Two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- are presented.
- Score: 30.97847679999505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient descent ascent (GDA), the simplest single-loop algorithm for
nonconvex minimax optimization, is widely used in practical applications such
as generative adversarial networks (GANs) and adversarial training. Albeit its
desirable simplicity, recent work shows inferior convergence rates of GDA in
theory even assuming strong concavity of the objective on one side. This paper
establishes new convergence results for two alternative single-loop algorithms
-- alternating GDA and smoothed GDA -- under the mild assumption that the
objective satisfies the Polyak-Lojasiewicz (PL) condition about one variable.
We prove that, to find an $\epsilon$-stationary point, (i) alternating GDA and
its stochastic variant (without mini batch) respectively require $O(\kappa^{2}
\epsilon^{-2})$ and $O(\kappa^{4} \epsilon^{-4})$ iterations, while (ii)
smoothed GDA and its stochastic variant (without mini batch) respectively
require $O(\kappa \epsilon^{-2})$ and $O(\kappa^{2} \epsilon^{-4})$ iterations.
The latter greatly improves over the vanilla GDA and gives the hitherto best
known complexity results among single-loop algorithms under similar settings.
We further showcase the empirical efficiency of these algorithms in training
GANs and robust nonlinear regression.
Related papers
- MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning.
Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions.
We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Stochastic Policy Gradient Methods: Improved Sample Complexity for
Fisher-non-degenerate Policies [19.779044926914704]
We develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies.
In this work, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $tildemathcalO(varepsilon-2.5)$ sample complexity of this method.
We further improve this complexity to $tilde mathcalmathcalO (varepsilon-2)$ by considering a Hessian-Aided Recursive Policy Gradient
arXiv Detail & Related papers (2023-02-03T13:50:23Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - A Fully Single Loop Algorithm for Bilevel Optimization without Hessian
Inverse [121.54116938140754]
We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems.
We show that our algorithm converges with the rate of $O(epsilon-2)$.
arXiv Detail & Related papers (2021-12-09T02:27:52Z) - Solve Minimax Optimization by Anderson Acceleration [5.900781483345349]
Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity.
We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration.
We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions.
arXiv Detail & Related papers (2021-10-06T02:08:54Z) - AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax
Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems.
We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z) - A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for
Nonconvex-Concave Min-Max Problems [33.83671897619922]
Non-con-max problem arises in many applications including minimizing a pointwise set of non functions to solve this robust problem.
A.A. algorithm can achieve an $(/A-O-) of $(/A-O-)$ for a finite collection of non functions.
arXiv Detail & Related papers (2020-10-29T17:13:46Z) - On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems.
We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$.
We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.