Proximal and Federated Random Reshuffling
- URL: http://arxiv.org/abs/2102.06704v1
- Date: Fri, 12 Feb 2021 18:59:24 GMT
- Title: Proximal and Federated Random Reshuffling
- Authors: Konstantin Mishchenko and Ahmed Khaled and Peter Richt\'arik
- Abstract summary: We propose two new algorithms for Random Reshuffling.
ProxRR and FedRR solve composite convex finite-sum minimization problems.
ProxRR is faster than algorithms that evaluate the proximal operator in every iteration.
- Score: 11.83842808044211
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD)
without replacement, is a popular and theoretically grounded method for
finite-sum minimization. We propose two new algorithms: Proximal and Federated
Random Reshuffing (ProxRR and FedRR). The first algorithm, ProxRR, solves
composite convex finite-sum minimization problems in which the objective is the
sum of a (potentially non-smooth) convex regularizer and an average of $n$
smooth objectives. We obtain the second algorithm, FedRR, as a special case of
ProxRR applied to a reformulation of distributed problems with either
homogeneous or heterogeneous data. We study the algorithms' convergence
properties with constant and decreasing stepsizes, and show that they have
considerable advantages over Proximal and Local SGD. In particular, our methods
have superior complexities and ProxRR evaluates the proximal operator once per
epoch only. When the proximal operator is expensive to compute, this small
difference makes ProxRR up to $n$ times faster than algorithms that evaluate
the proximal operator in every iteration. We give examples of practical
optimization tasks where the proximal operator is difficult to compute and
ProxRR has a clear advantage. Finally, we corroborate our results with
experiments on real data sets.
Related papers
- SOREL: A Stochastic Algorithm for Spectral Risks Minimization [1.6574413179773761]
spectral risk has wide applications in machine learning, especially in real-world decision-making.
By assigning different weights to the losses of different sample points, it allows the model's performance to lie between the average performance and the worst-case performance.
We propose SOREL, the first gradient-based algorithm with convergence guarantees for the spectral risk minimization.
arXiv Detail & Related papers (2024-07-19T18:20:53Z) - A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization [6.314057999212246]
Random reshuffling techniques are used in large-scale applications, such as neural networks.
In this paper, we show that the random reshuffling-type iterations generated by norm-PRR converge to a linear setting.
Finally, we derive last convergence rates that can be applied to the proposed approach.
arXiv Detail & Related papers (2023-12-02T07:12:00Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Fast Optimal Locally Private Mean Estimation via Random Projections [58.603579803010796]
We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball.
We propose a new algorithmic framework, ProjUnit, for private mean estimation.
Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an optimal algorithm.
arXiv Detail & Related papers (2023-06-07T14:07:35Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Distributed Random Reshuffling over Networks [7.013052033764372]
A distributed resh-upr (D-RR) algorithm is proposed to solve the problem of convex and smooth objective functions.
In particular, for smooth convex objective functions, D-RR achieves D-T convergence rate (where $T counts epoch number) in terms of distance between the global drives.
arXiv Detail & Related papers (2021-12-31T03:59:37Z) - Complexity of Inexact Proximal Point Algorithm for minimizing convex functions with Holderian Growth [1.9643748953805935]
We derive nonasymptotic complexity of exact and inexact PPA to minimize convex functions under $gamma-$Holderian growth.
Our numerical tests show improvements over existing restarting versions of the Subgradient Method.
arXiv Detail & Related papers (2021-08-10T07:15:07Z) - Exploiting Higher Order Smoothness in Derivative-free Optimization and
Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function.
We consider a randomized approximation of the projected gradient descent algorithm.
Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z) - Random Reshuffling: Simple Analysis with Vast Improvements [9.169947558498535]
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative descent steps conjunction in with data reshuffuffling.
arXiv Detail & Related papers (2020-06-10T17:57:21Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.