SGD with shuffling: optimal rates without component convexity and large
epoch requirements
- URL: http://arxiv.org/abs/2006.06946v2
- Date: Mon, 22 Jun 2020 03:42:32 GMT
- Title: SGD with shuffling: optimal rates without component convexity and large
epoch requirements
- Authors: Kwangjun Ahn, Chulhee Yun, Suvrit Sra
- Abstract summary: We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once)
We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps.
We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
- Score: 60.65928290219793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study without-replacement SGD for solving finite-sum optimization
problems. Specifically, depending on how the indices of the finite-sum are
shuffled, we consider the RandomShuffle (shuffle at the beginning of each
epoch) and SingleShuffle (shuffle only once) algorithms. First, we establish
minimax optimal convergence rates of these algorithms up to poly-log factors.
Notably, our analysis is general enough to cover gradient dominated nonconvex
costs, and does not rely on the convexity of individual component functions
unlike existing optimal convergence results. Secondly, assuming convexity of
the individual components, we further sharpen the tight convergence results for
RandomShuffle by removing the drawbacks common to all prior arts: large number
of epochs required for the results to hold, and extra poly-log factor gaps to
the lower bound.
Related papers
- Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth.
Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z) - Private optimization in the interpolation regime: faster rates and
hardness results [9.405458160620533]
In non-private convex optimization, gradient methods converge much faster on problems -- problems where there exists a solution that simultaneously minimizes all of the sample losses -- than on non-interpolating ones.
We show (near) exponential improvements in the private sample complexity.
arXiv Detail & Related papers (2022-10-31T05:18:27Z) - Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax
Optimization [12.794526174053134]
We analyze convergence rates of gradient algorithms for smooth finite-sum minimax optimization.
We show that, for many such algorithms, sampling points without replacement leads to faster convergence compared to sampling with replacement.
arXiv Detail & Related papers (2022-06-07T00:37:37Z) - Accelerated SGD for Non-Strongly-Convex Least Squares [14.010916616909743]
We consider approximation for the least squares regression problem in the non-strongly convex setting.
We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem.
arXiv Detail & Related papers (2022-03-03T14:39:33Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - Tight Lower Complexity Bounds for Strongly Convex Finite-Sum
Optimization [21.435973899949285]
We derive tight lower complexity bounds of randomized incremental gradient methods, including SAG, SAGA, SVRG, and SARAH, for two typical cases of finite-sum optimization.
Our results tightly match the upper complexity of Katyusha or VRADA when each component function is strongly convex and smooth, and tightly match the upper complexity of SDCA without duality and of KatyushaX when the finite-sum function is strongly convex and the component functions are average smooth.
arXiv Detail & Related papers (2020-10-17T11:19:07Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.