Distributed Random Reshuffling over Networks
- URL: http://arxiv.org/abs/2112.15287v5
- Date: Thu, 23 Mar 2023 06:44:25 GMT
- Title: Distributed Random Reshuffling over Networks
- Authors: Kun Huang, Xiao Li, Andre Milzarek, Shi Pu, and Junwen Qiu
- Abstract summary: A distributed resh-upr (D-RR) algorithm is proposed to solve the problem of convex and smooth objective functions.
In particular, for smooth convex objective functions, D-RR achieves D-T convergence rate (where $T counts epoch number) in terms of distance between the global drives.
- Score: 7.013052033764372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider distributed optimization problems where $n$
agents, each possessing a local cost function, collaboratively minimize the
average of the local cost functions over a connected network. To solve the
problem, we propose a distributed random reshuffling (D-RR) algorithm that
invokes the random reshuffling (RR) update in each agent. We show that D-RR
inherits favorable characteristics of RR for both smooth strongly convex and
smooth nonconvex objective functions. In particular, for smooth strongly convex
objective functions, D-RR achieves $\mathcal{O}(1/T^2)$ rate of convergence
(where $T$ counts epoch number) in terms of the squared distance between the
iterate and the global minimizer. When the objective function is assumed to be
smooth nonconvex, we show that D-RR drives the squared norm of gradient to $0$
at a rate of $\mathcal{O}(1/T^{2/3})$. These convergence results match those of
centralized RR (up to constant factors) and outperform the distributed
stochastic gradient descent (DSGD) algorithm if we run a relatively large
number of epochs. Finally, we conduct a set of numerical experiments to
illustrate the efficiency of the proposed D-RR method on both strongly convex
and nonconvex distributed optimization problems.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization [6.314057999212246]
Random reshuffling techniques are used in large-scale applications, such as neural networks.
In this paper, we show that the random reshuffling-type iterations generated by norm-PRR converge to a linear setting.
Finally, we derive last convergence rates that can be applied to the proposed approach.
arXiv Detail & Related papers (2023-12-02T07:12:00Z) - Distributed Random Reshuffling Methods with Improved Convergence [8.112170817124444]
This paper proposes two distributed random reshuffling methods, namely Gradient Tracking with Random Reshuffling (GT-RR) and Exact Diffusion with Random Reshuffling (ED-RR)
arXiv Detail & Related papers (2023-06-21T06:05:34Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - Non-convex Distributionally Robust Optimization: Non-asymptotic Analysis [16.499651513178012]
Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift.
We provide non-asymptotic convergence guarantees even though the objective function is possibly prominent nonsmooth- and has normalized gradient descent.
arXiv Detail & Related papers (2021-10-24T14:56:38Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order
Coordinate Method [3.860616339202303]
This paper investigates the distributed non optimization problem of minimizing a global cost function formed by the summation of $ZOn$ local cost functions.
Agents approximate their own ZO coordinate method to solve the problem.
arXiv Detail & Related papers (2021-03-24T03:07:46Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Proximal and Federated Random Reshuffling [11.83842808044211]
We propose two new algorithms for Random Reshuffling.
ProxRR and FedRR solve composite convex finite-sum minimization problems.
ProxRR is faster than algorithms that evaluate the proximal operator in every iteration.
arXiv Detail & Related papers (2021-02-12T18:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.