A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints
- URL: http://arxiv.org/abs/2009.11359v4
- Date: Tue, 27 Apr 2021 00:53:41 GMT
- Title: A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints
- Authors: Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse
- Abstract summary: In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
- Score: 10.578409461429626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The theory of integral quadratic constraints (IQCs) allows the certification
of exponential convergence of interconnected systems containing nonlinear or
uncertain elements. In this work, we adapt the IQC theory to study first-order
methods for smooth and strongly-monotone games and show how to design tailored
quadratic constraints to get tight upper bounds of convergence rates. Using
this framework, we recover the existing bound for the gradient method~(GD),
derive sharper bounds for the proximal point method~(PPM) and optimistic
gradient method~(OG), and provide \emph{for the first time} a global
convergence rate for the negative momentum method~(NM) with an iteration
complexity $\mathcal{O}(\kappa^{1.5})$, which matches its known lower bound. In
addition, for time-varying systems, we prove that the gradient method with
optimal step size achieves the fastest provable worst-case convergence rate
with quadratic Lyapunov functions. Finally, we further extend our analysis to
stochastic games and study the impact of multiplicative noise on different
algorithms. We show that it is impossible for an algorithm with one step of
memory to achieve acceleration if it only queries the gradient once per batch
(in contrast with the stochastic strongly-convex optimization setting, where
such acceleration has been demonstrated). However, we exhibit an algorithm
which achieves acceleration with two gradient queries per batch.
Related papers
- An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root.
We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox [11.564643329398438]
We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022)
Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications.
Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
arXiv Detail & Related papers (2022-07-08T15:24:13Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes.
Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition.
Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z) - A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems.
Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm.
We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.