Related papers: SGD with shuffling: optimal rates without component convexity and large epoch requirements

SGD with shuffling: optimal rates without component convexity and large epoch requirements

URL: http://arxiv.org/abs/2006.06946v2
Date: Mon, 22 Jun 2020 03:42:32 GMT
Title: SGD with shuffling: optimal rates without component convexity and large epoch requirements
Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra
Abstract summary: We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps. We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
Score: 60.65928290219793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is general enough to cover gradient dominated nonconvex costs, and does not rely on the convexity of individual component functions unlike existing optimal convergence results. Secondly, assuming convexity of the individual components, we further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts: large number of epochs required for the results to hold, and extra poly-log factor gaps to the lower bound.

Related papers

Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth. Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z)
Private optimization in the interpolation regime: faster rates and hardness results [9.405458160620533]
In non-private convex optimization, gradient methods converge much faster on problems -- problems where there exists a solution that simultaneously minimizes all of the sample losses -- than on non-interpolating ones. We show (near) exponential improvements in the private sample complexity.
arXiv Detail & Related papers (2022-10-31T05:18:27Z)
Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization [12.794526174053134]
We analyze convergence rates of gradient algorithms for smooth finite-sum minimax optimization. We show that, for many such algorithms, sampling points without replacement leads to faster convergence compared to sampling with replacement.
arXiv Detail & Related papers (2022-06-07T00:37:37Z)
Accelerated SGD for Non-Strongly-Convex Least Squares [14.010916616909743]
We consider approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem.
arXiv Detail & Related papers (2022-03-03T14:39:33Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order. We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z)
Tight Lower Complexity Bounds for Strongly Convex Finite-Sum Optimization [21.435973899949285]
We derive tight lower complexity bounds of randomized incremental gradient methods, including SAG, SAGA, SVRG, and SARAH, for two typical cases of finite-sum optimization. Our results tightly match the upper complexity of Katyusha or VRADA when each component function is strongly convex and smooth, and tightly match the upper complexity of SDCA without duality and of KatyushaX when the finite-sum function is strongly convex and the component functions are average smooth.
arXiv Detail & Related papers (2020-10-17T11:19:07Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions. A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples. We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.