Epidemic Learning: Boosting Decentralized Learning with Randomized
Communication
- URL: http://arxiv.org/abs/2310.01972v2
- Date: Fri, 27 Oct 2023 09:52:12 GMT
- Title: Epidemic Learning: Boosting Decentralized Learning with Randomized
Communication
- Authors: Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie
Kermarrec, Rafael Pires, Rishi Sharma
- Abstract summary: We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm.
EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume.
Our results illustrate that EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume.
- Score: 8.685687713892193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Epidemic Learning (EL), a simple yet powerful decentralized
learning (DL) algorithm that leverages changing communication topologies to
achieve faster model convergence compared to conventional DL approaches. At
each round of EL, each node sends its model updates to a random sample of $s$
other nodes (in a system of $n$ nodes). We provide an extensive theoretical
analysis of EL, demonstrating that its changing topology culminates in superior
convergence properties compared to the state-of-the-art (static and dynamic)
topologies. Considering smooth non-convex loss functions, the number of
transient iterations for EL, i.e., the rounds required to achieve asymptotic
linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound
$O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized
communication for DL. We empirically evaluate EL in a 96-node network and
compare its performance with state-of-the-art DL approaches. Our results
illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL
algorithms and attains $2.2 $\% higher accuracy for the same communication
volume.
Related papers
- Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time [3.1789549088190414]
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively collaborate the sum of their local cost functions via peer-to-peer communication.
We propose a novel, Spanning Tree Push-Pull (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and parameters.
arXiv Detail & Related papers (2025-03-20T13:11:44Z) - An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks [11.925232472331494]
We develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network.
New proof techniques are developed and an improved new $tildemathcalO(epsilon-1)$ sample complexity is derived.
arXiv Detail & Related papers (2024-05-07T05:29:55Z) - Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation [11.925232472331494]
A Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation.
In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE)
We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.
arXiv Detail & Related papers (2023-02-25T14:14:01Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$.
We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
arXiv Detail & Related papers (2022-07-26T08:47:54Z) - Provably Efficient Convergence of Primal-Dual Actor-Critic with
Nonlinear Function Approximation [15.319335698574932]
We show the first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sampling.
Results on Open GymAI continuous control tasks.
arXiv Detail & Related papers (2022-02-28T15:16:23Z) - DASHA: Distributed Nonconvex Optimization with Communication
Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems.
Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z) - Near-Optimal Sparse Allreduce for Distributed Deep Learning [18.99898181586806]
Communication overhead is one of the major obstacles to train large deep learning models at scale.
This paper proposes O$k$-Top$k$, a scheme for distributed training with sparse gradients.
arXiv Detail & Related papers (2022-01-19T13:56:57Z) - Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear
Speedup [56.27526702716774]
This paper revisits the A3C algorithm with TD(0) for the critic update, termed A3C-TD(0), with provable convergence guarantees.
Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of $mathcalO(epsilon-2.5/N)$ per worker to achieve $epsilon$ accuracy, where $N$ is the number of workers.
Compared to the best-known sample complexity of $mathcalO(epsilon-2.5/N)$ for two
arXiv Detail & Related papers (2020-12-31T09:07:09Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$.
We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.