Related papers: Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

URL: http://arxiv.org/abs/2112.05604v1
Date: Fri, 10 Dec 2021 15:35:42 GMT
Title: Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity
Authors: Junchi Yang, Antonio Orvieto, Aurelien Lucchi and Niao He
Abstract summary: Gradient descent ascent (GDA) is the simplest single-loop algorithm for nonstationary minimax optimization. Recent work shows inferior convergence rates of GDA in adversarial networks. Two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- are presented.
Score: 30.97847679999505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training. Albeit its desirable simplicity, recent work shows inferior convergence rates of GDA in theory even assuming strong concavity of the objective on one side. This paper establishes new convergence results for two alternative single-loop algorithms -- alternating GDA and smoothed GDA -- under the mild assumption that the objective satisfies the Polyak-Lojasiewicz (PL) condition about one variable. We prove that, to find an $\epsilon$-stationary point, (i) alternating GDA and its stochastic variant (without mini batch) respectively require $O(\kappa^{2} \epsilon^{-2})$ and $O(\kappa^{4} \epsilon^{-4})$ iterations, while (ii) smoothed GDA and its stochastic variant (without mini batch) respectively require $O(\kappa \epsilon^{-2})$ and $O(\kappa^{2} \epsilon^{-4})$ iterations. The latter greatly improves over the vanilla GDA and gives the hitherto best known complexity results among single-loop algorithms under similar settings. We further showcase the empirical efficiency of these algorithms in training GANs and robust nonlinear regression.

Related papers

MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions. We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z)
On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache. This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z)
Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization [41.28002701420715]
Minimax optimization has been widely applied in many machine learning tasks. We show that our methods have the best known sample complexity without relying on any specific types.
arXiv Detail & Related papers (2023-03-07T15:33:12Z)
Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies [19.779044926914704]
We develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies. In this work, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $tildemathcalO(varepsilon-2.5)$ sample complexity of this method. We further improve this complexity to $tilde mathcalmathcalO (varepsilon-2)$ by considering a Hessian-Aided Recursive Policy Gradient
arXiv Detail & Related papers (2023-02-03T13:50:23Z)
Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems. It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z)
A Fully Single Loop Algorithm for Bilevel Optimization without Hessian Inverse [121.54116938140754]
We propose a new Hessian inverse free Fully Single Loop Algorithm for bilevel optimization problems. We show that our algorithm converges with the rate of $O(epsilon-2)$.
arXiv Detail & Related papers (2021-12-09T02:27:52Z)
Solve Minimax Optimization by Anderson Acceleration [5.900781483345349]
Gradient descent ascent (GDA) is the most commonly used algorithm due to its simplicity. We propose a new minimax optimization framework, GDA-AM, that views the GDAdynamics as a fixed-point iteration. We show theoretically that the algorithm can achieve global convergence for bilinear problems under mild conditions.
arXiv Detail & Related papers (2021-10-06T02:08:54Z)
AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems. We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z)
A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems [33.83671897619922]
Non-con-max problem arises in many applications including minimizing a pointwise set of non functions to solve this robust problem. A.A. algorithm can achieve an $(/A-O-) of $(/A-O-)$ for a finite collection of non functions.
arXiv Detail & Related papers (2020-10-29T17:13:46Z)
On Riemannian Gradient-Based Methods for Minimax Problems [24.199289678553896]
We propose a class of Riemanian-based methods to solve minimax problems. We prove that our RGDA has a sample complexity of $tildeO(kappa4eps-3)$. We also show that our Acc-RSGDA achieves a lower sample complexity on $tildeO(kappa4eps-3)$.
arXiv Detail & Related papers (2020-10-13T00:54:00Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.