Related papers: Random Reshuffling: Simple Analysis with Vast Improvements

Random Reshuffling: Simple Analysis with Vast Improvements

URL: http://arxiv.org/abs/2006.05988v3
Date: Mon, 5 Apr 2021 11:40:26 GMT
Title: Random Reshuffling: Simple Analysis with Vast Improvements
Authors: Konstantin Mishchenko and Ahmed Khaled and Peter Richt\'arik
Abstract summary: Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative descent steps conjunction in with data reshuffuffling.
Score: 9.169947558498535
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large. We remove these 3 assumptions, improve the dependence on the condition number from $\kappa^2$ to $\kappa$ (resp. from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance. We argue through theory and experiments that the new variance type gives an additional justification of the superior performance of RR. To go beyond strong convexity, we present several results for non-strongly convex and non-convex objectives. We show that in all cases, our theory improves upon existing literature. Finally, we prove fast convergence of the Shuffle-Once (SO) algorithm, which shuffles the data only once, at the beginning of the optimization process. Our theory for strongly-convex objectives tightly matches the known lower bounds for both RR and SO and substantiates the common practical heuristic of shuffling once or only a few times. As a byproduct of our analysis, we also get new results for the Incremental Gradient algorithm (IG), which does not shuffle the data at all.

Related papers

Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization [21.865728815935665]
We show that Random Reshuffle ($textsfRR$) and Single Shuffle ($textsfSS$) strategies are both provably faster than Proximal GD.<n>As an important implication, we give the first (nearly) optimal convergence result for the suffix average under the $textsfRR$ sampling scheme.
arXiv Detail & Related papers (2025-05-29T03:53:45Z)
On the Last-Iterate Convergence of Shuffling Gradient Methods [21.865728815935665]
We prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value. Our results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
arXiv Detail & Related papers (2024-03-12T15:01:17Z)
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel. An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z)
A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization [6.314057999212246]
Random reshuffling techniques are used in large-scale applications, such as neural networks. In this paper, we show that the random reshuffling-type iterations generated by norm-PRR converge to a linear setting. Finally, we derive last convergence rates that can be applied to the proposed approach.
arXiv Detail & Related papers (2023-12-02T07:12:00Z)
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth. We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Distributed Random Reshuffling over Networks [7.013052033764372]
A distributed resh-upr (D-RR) algorithm is proposed to solve the problem of convex and smooth objective functions. In particular, for smooth convex objective functions, D-RR achieves D-T convergence rate (where $T counts epoch number) in terms of distance between the global drives.
arXiv Detail & Related papers (2021-12-31T03:59:37Z)
Random Reshuffling with Variance Reduction: New Analysis and Better Rates [0.8594140167290099]
We provide the first analysis of Random Reshuffling (RRSVRG) for generalsum problems. We show RRSVRG can be improved to $calO(kappa3/2) in virtually widely used $mathcalO(kappa3/2) rates. This improves upon the previous best $mathcalO(kappa3/2) RRSVRG method.
arXiv Detail & Related papers (2021-04-19T14:30:10Z)
Proximal and Federated Random Reshuffling [11.83842808044211]
We propose two new algorithms for Random Reshuffling. ProxRR and FedRR solve composite convex finite-sum minimization problems. ProxRR is faster than algorithms that evaluate the proximal operator in every iteration.
arXiv Detail & Related papers (2021-02-12T18:59:24Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.