Unified Analysis of Stochastic Gradient Methods for Composite Convex and
Smooth Optimization
- URL: http://arxiv.org/abs/2006.11573v1
- Date: Sat, 20 Jun 2020 13:40:27 GMT
- Title: Unified Analysis of Stochastic Gradient Methods for Composite Convex and
Smooth Optimization
- Authors: Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter
Richt\'arik
- Abstract summary: We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer.
We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex.
Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
- Score: 15.82816385434718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a unified theorem for the convergence analysis of stochastic
gradient algorithms for minimizing a smooth and convex loss plus a convex
regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely
\& Richt\'arik (2020) and dropping the requirement that the loss function be
strongly convex. Instead, we only rely on convexity of the loss function. Our
unified analysis applies to a host of existing algorithms such as proximal SGD,
variance reduced methods, quantization and some coordinate descent type
methods. For the variance reduced methods, we recover the best known
convergence rates as special cases. For proximal SGD, the quantization and
coordinate type methods, we uncover new state-of-the-art convergence rates. Our
analysis also includes any form of sampling and minibatching. As such, we are
able to determine the minibatch size that optimizes the total complexity of
variance reduced methods. We showcase this by obtaining a simple formula for
the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and
\textit{SAGA}). This optimal minibatch size not only improves the theoretical
total complexity of the methods but also improves their convergence in
practice, as we show in several experiments.
Related papers
- Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - SignSVRG: fixing SignSGD via variance reduction [3.3504365823045044]
We propose a simple, yet, practical way to incorporate variance reduction techniques into SignSGD.
We show that for smooth functions our method gives $mathcalO (1 / sqrtT)$ rate for expected norm of the gradient and $mathcalO (1/T)$ rate in the case of smooth convex functions.
arXiv Detail & Related papers (2023-05-22T16:14:53Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.