Related papers: Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

URL: http://arxiv.org/abs/2006.11573v1
Date: Sat, 20 Jun 2020 13:40:27 GMT
Title: Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization
Authors: Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richt\'arik
Abstract summary: We present a unified theorem for the convergence analysis of gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely & Richt'arik ( 2020) and dropping the requirement that the loss function be strongly convex. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods.
Score: 15.82816385434718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a unified theorem for the convergence analysis of stochastic gradient algorithms for minimizing a smooth and convex loss plus a convex regularizer. We do this by extending the unified analysis of Gorbunov, Hanzely \& Richt\'arik (2020) and dropping the requirement that the loss function be strongly convex. Instead, we only rely on convexity of the loss function. Our unified analysis applies to a host of existing algorithms such as proximal SGD, variance reduced methods, quantization and some coordinate descent type methods. For the variance reduced methods, we recover the best known convergence rates as special cases. For proximal SGD, the quantization and coordinate type methods, we uncover new state-of-the-art convergence rates. Our analysis also includes any form of sampling and minibatching. As such, we are able to determine the minibatch size that optimizes the total complexity of variance reduced methods. We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}). This optimal minibatch size not only improves the theoretical total complexity of the methods but also improves their convergence in practice, as we show in several experiments.

Related papers

Decentralized Optimization on Compact Submanifolds by Quantized Riemannian Gradient Tracking [45.147301546565316]
This paper considers the problem of decentralized optimization on compact submanifolds.<n>We propose an algorithm where agents update variables using quantized variables.<n>To the best of our knowledge, this is the first algorithm to achieve an $mathcalO (1/K)$ convergence rate in the presence of quantization.
arXiv Detail & Related papers (2025-06-09T01:57:25Z)
Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms. We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z)
SignSVRG: fixing SignSGD via variance reduction [3.3504365823045044]
We propose a simple, yet, practical way to incorporate variance reduction techniques into SignSGD. We show that for smooth functions our method gives $mathcalO (1 / sqrtT)$ rate for expected norm of the gradient and $mathcalO (1/T)$ rate in the case of smooth convex functions.
arXiv Detail & Related papers (2023-05-22T16:14:53Z)
Asymptotically Unbiased Instance-wise Regularized Partial AUC Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier. Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable. We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
A Variance Controlled Stochastic Method with Biased Estimation for Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG) $lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG. $mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.