Min-Max Optimization under Delays
- URL: http://arxiv.org/abs/2307.06886v3
- Date: Fri, 25 Aug 2023 03:14:32 GMT
- Title: Min-Max Optimization under Delays
- Authors: Arman Adibi, Aritra Mitra, and Hamed Hassani
- Abstract summary: Delays and asynchrony are inevitable in large-scale machine-learning problems.
No analogous theory is available for min-max optimization.
We show that even small delays can cause prominent algorithms like Extra-gradient to diverge.
- Score: 26.830212508878162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Delays and asynchrony are inevitable in large-scale machine-learning problems
where communication plays a key role. As such, several works have extensively
analyzed stochastic optimization with delayed gradients. However, as far as we
are aware, no analogous theory is available for min-max optimization, a topic
that has gained recent popularity due to applications in adversarial
robustness, game theory, and reinforcement learning. Motivated by this gap, we
examine the performance of standard min-max optimization algorithms with
delayed gradient updates. First, we show (empirically) that even small delays
can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on
simple instances for which \texttt{EG} guarantees convergence in the absence of
delays. Our empirical study thus suggests the need for a careful analysis of
delayed versions of min-max optimization algorithms. Accordingly, under
suitable technical assumptions, we prove that Gradient Descent-Ascent
(\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee
convergence to saddle points for convex-concave and strongly convex-strongly
concave settings. Our complexity bounds reveal, in a transparent manner, the
slow-down in convergence caused by delays.
Related papers
- Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion [9.551565016483833]
We study comonotone min-max optimization, a structured class of non-nonconcave min-max optimization problems.
In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm to constrained comonotone min-max optimization.
In our second contribution, we extend the Fast Extra Gradient (FEG) algorithm to constrainednotone min-max optimization.
arXiv Detail & Related papers (2022-06-10T17:44:06Z) - Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for
Solving Nonconvex Min-Max Problems [2.0305676256390934]
We characterize the finite-time performance of the continuous-time variant of simultaneous gradient descent-ascent algorithm.
Our results on the behavior of continuous-time algorithm may be used to enhance the convergence properties of its discrete-time counterpart.
arXiv Detail & Related papers (2021-12-17T15:51:04Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Optimal Epoch Stochastic Gradient Descent Ascent Methods for Min-Max
Optimization [61.66927778694731]
Epoch gradient descent method (a.k.a. Epoch-GD) proposed by Hazan and Kale (2011) was deemed a breakthrough for strongly convex minimization.
Our result is the first one that shows Epoch-GDA can achieve the optimal rate of $O (1/T)$ for the duality gap of SCSC min-max problems.
arXiv Detail & Related papers (2020-02-13T02:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.