Related papers: Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

URL: http://arxiv.org/abs/2310.01972v2
Date: Fri, 27 Oct 2023 09:52:12 GMT
Title: Epidemic Learning: Boosting Decentralized Learning with Randomized Communication
Authors: Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma
Abstract summary: We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm. EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume. Our results illustrate that EL converges up to $ 1.7times$ quicker than baseline DL algorithms and attains $2.2 $% higher accuracy for the same communication volume.
Score: 8.685687713892193
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.

Related papers

Distributed Learning over Arbitrary Topology: Linear Speed-Up with Polynomial Transient Time [3.1789549088190414]
We study a distributed learning problem in which $n$ agents, each with potentially heterogeneous local data, collaboratively collaborate the sum of their local cost functions via peer-to-peer communication. We propose a novel, Spanning Tree Push-Pull (STPP), which employs two trees extracted from a general communication graph to distribute both model parameters and parameters.
arXiv Detail & Related papers (2025-03-20T13:11:44Z)
An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks [11.925232472331494]
We develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $tildemathcalO(epsilon-1)$ sample complexity is derived.
arXiv Detail & Related papers (2024-05-07T05:29:55Z)
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation [11.925232472331494]
A Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE) We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods.
arXiv Detail & Related papers (2023-02-25T14:14:01Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization [0.0]
DADAO is the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $mu$-strongly convex functions distributed over a given network of size $n$. We show that our algorithm requires $mathcalO(nsqrtchisqrtfracLmulog(frac1epsilon)$ local and only $mathcalO(nsqrtchisqrtfracLmulog(
arXiv Detail & Related papers (2022-07-26T08:47:54Z)
Provably Efficient Convergence of Primal-Dual Actor-Critic with Nonlinear Function Approximation [15.319335698574932]
We show the first efficient convergence result with primal-dual actor-critic with a convergence of $mathcalOleft ascent(Nright)Nright)$ under Polyian sampling. Results on Open GymAI continuous control tasks.
arXiv Detail & Related papers (2022-02-28T15:16:23Z)
DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems. Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z)
Near-Optimal Sparse Allreduce for Distributed Deep Learning [18.99898181586806]
Communication overhead is one of the major obstacles to train large deep learning models at scale. This paper proposes O$k$-Top$k$, a scheme for distributed training with sparse gradients.
arXiv Detail & Related papers (2022-01-19T13:56:57Z)
Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear Speedup [56.27526702716774]
This paper revisits the A3C algorithm with TD(0) for the critic update, termed A3C-TD(0), with provable convergence guarantees. Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of $mathcalO(epsilon-2.5/N)$ per worker to achieve $epsilon$ accuracy, where $N$ is the number of workers. Compared to the best-known sample complexity of $mathcalO(epsilon-2.5/N)$ for two
arXiv Detail & Related papers (2020-12-31T09:07:09Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$. We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.