Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization
- URL: http://arxiv.org/abs/2206.08573v1
- Date: Fri, 17 Jun 2022 06:10:20 GMT
- Title: Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization
- Authors: Simon S. Du, Gauthier Gidel, Michael I. Jordan, Chris Junchi Li
- Abstract summary: We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
- Score: 116.89941263390769
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the smooth convex-concave bilinearly-coupled saddle-point
problem, $\min_{\mathbf{x}}\max_{\mathbf{y}}~F(\mathbf{x}) +
H(\mathbf{x},\mathbf{y}) - G(\mathbf{y})$, where one has access to stochastic
first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
Building upon standard stochastic extragradient analysis for variational
inequalities, we present a stochastic \emph{accelerated gradient-extragradient
(AG-EG)} descent-ascent algorithm that combines extragradient and Nesterov's
acceleration in general stochastic settings. This algorithm leverages scheduled
restarting to admit a fine-grained nonasymptotic convergence rate that matches
known lower bounds by both \citet{ibrahim2020linear} and \citet{zhang2021lower}
in their corresponding settings, plus an additional statistical error term for
bounded stochastic noise that is optimal up to a constant prefactor. This is
the first result that achieves such a relatively mature characterization of
optimality in saddle-point optimization.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems [56.86067111855056]
We consider clipped optimization problems with heavy-tailed noise with structured density.
We show that it is possible to get faster rates of convergence than $mathcalO(K-(alpha - 1)/alpha)$, when the gradients have finite moments of order.
We prove that the resulting estimates have negligible bias and controllable variance.
arXiv Detail & Related papers (2023-11-07T17:39:17Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation [22.758674468435302]
In a heavy-tailed noise regime, the difference between the gradient and the true rate is assumed to have a finite $p-th moment.
This paper provides a comprehensive analysis of nonsmooth convex optimization with heavy-tailed noises.
arXiv Detail & Related papers (2023-03-22T03:05:28Z) - Randomized Coordinate Subgradient Method for Nonsmooth Composite
Optimization [11.017632675093628]
Coordinate-type subgradient methods for addressing nonsmooth problems are relatively underexplored due to the set of properties of the Lipschitz-type assumption.
arXiv Detail & Related papers (2022-06-30T02:17:11Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - S-ADDOPT: Decentralized stochastic first-order optimization over
directed graphs [16.96562173221624]
Decentralized convex optimization is proposed to minimize a sum of smooth and strongly cost functions when the functions are distributed over a directed network nodes.
In particular, we propose thetextbftextttS-ADDOPT algorithm that assumes a first-order oracle at each node.
For decaying step-sizes$mathcalO (1/k)$, we show thattextbfttS-ADDOPT reaches the exact solution subly at$mathcalO (1/k)$ and its convergence is networkally-independent
arXiv Detail & Related papers (2020-05-15T21:14:22Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.