Related papers: Communication-efficient Decentralized Local SGD over Undirected Networks

Communication-efficient Decentralized Local SGD over Undirected Networks

URL: http://arxiv.org/abs/2011.03255v1
Date: Fri, 6 Nov 2020 09:34:00 GMT
Title: Communication-efficient Decentralized Local SGD over Undirected Networks
Authors: Tiancheng Qin, S. Rasoul Etesami, C\'esar A. Uribe
Abstract summary: We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. We analyze the trade-off between the number of communication rounds and the computational effort of each agent. Our results show that by using only $R=Omega(n)$ communication rounds, one can achieve an error that scales as $O(1/nT)$.
Score: 2.3572498744567123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the distributed learning problem where a network of $n$ agents seeks to minimize a global function $F$. Agents have access to $F$ through noisy gradients, and they can locally communicate with their neighbors a network. We study the Decentralized Local SDG method, where agents perform a number of local gradient steps and occasionally exchange information with their neighbors. Previous algorithmic analysis efforts have focused on the specific network topology (star topology) where a leader node aggregates all agents' information. We generalize that setting to an arbitrary network by analyzing the trade-off between the number of communication rounds and the computational effort of each agent. We bound the expected optimality gap in terms of the number of iterates $T$, the number of workers $n$, and the spectral gap of the underlying network. Our main results show that by using only $R=\Omega(n)$ communication rounds, one can achieve an error that scales as $O({1}/{nT})$, where the number of communication rounds is independent of $T$ and only depends on the number of agents. Finally, we provide numerical evidence of our theoretical results through experiments on real and synthetic data.

Related papers

Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit. In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z)
Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation [77.09836892653176]
We study multi-agent reinforcement learning in the setting of episodic Markov decision processes. We propose a provably efficient algorithm based on value that enable asynchronous communication. We show that a minimal $Omega(dM)$ communication complexity is required to improve the performance through collaboration.
arXiv Detail & Related papers (2023-05-10T20:29:29Z)
Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games [12.704529528199062]
We introduce a class of networked Markov potential games in which agents are associated with nodes in a network. Each agent has its own local potential function, and the reward of each agent depends only on the states and actions of the agents within a neighborhood. This is the first finite-sample bound for multi-agent competitive games that does not depend on the number of agents.
arXiv Detail & Related papers (2023-03-08T20:09:58Z)
Communication-Efficient Topologies for Decentralized Learning with $O(1)$ Consensus Rate [35.698182247676414]
Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. We show that the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is mixed'' by communication. We propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate.
arXiv Detail & Related papers (2022-10-14T15:02:01Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
On the Performance of Gradient Tracking with Local Updates [10.14746251086461]
We show that LU-GT has the same communication quality but allows arbitrary network topologies. Numerical examples reveal that local updates can lower communication costs in certain regimes.
arXiv Detail & Related papers (2022-10-10T15:13:23Z)
A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits [77.09836892653176]
We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We prove that the regret of textttFedLinUCB is bounded by $tildeO(dsqrtsum_m=1M T_m)$ and the communication complexity is $tildeO(dM
arXiv Detail & Related papers (2022-07-07T06:16:19Z)
High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees [20.701475313495884]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
Decentralized Optimization Over the Stiefel Manifold by an Approximate Augmented Lagrangian Function [7.768673476492471]
This paper focuses on the decentralized optimization problem over the Stiefel manifold. Agents can only communicate with their neighbors in a collaborative effort to solve this problem. In existing methods, multiple rounds of communications are required to guarantee the convergence. This paper proposes a decentralized algorithm, called DESTINY, which only invokes a single round of communications per iteration.
arXiv Detail & Related papers (2021-12-30T07:19:59Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z)
Distributed Saddle-Point Problems Under Similarity [173.19083235638104]
We show that a given suboptimality $epsilon0$ is achieved master/workers networks in $Omegabig. We then propose algorithms matching the lower bounds either types of networks (up to log-overs) We assess effectiveness of the proposed algorithms on a robust logistic regression problem.
arXiv Detail & Related papers (2021-07-22T14:25:16Z)
Learning Connectivity for Data Distribution in Robot Teams [96.39864514115136]
We propose a task-agnostic, decentralized, low-latency method for data distribution in ad-hoc networks using Graph Neural Networks (GNN) Our approach enables multi-agent algorithms based on global state information to function by ensuring it is available at each robot. We train the distributed GNN communication policies via reinforcement learning using the average Age of Information as the reward function and show that it improves training stability compared to task-specific reward functions.
arXiv Detail & Related papers (2021-03-08T21:48:55Z)
Multi-Level Local SGD for Heterogeneous Hierarchical Networks [11.699472346137739]
We propose Multi-Level Local SGD, a distributed gradient method for a learning, non- objective framework in a heterogeneous network. We first provide a unified mathematical that describes the Multi-Level Local SGD algorithm. We then present a theoretical analysis of the algorithm.
arXiv Detail & Related papers (2020-07-27T19:14:23Z)
On the Communication Latency of Wireless Decentralized Learning [12.977865337365856]
We consider a wireless network comprising $n$ nodes located within a circular area of radius $R$. To enable gradient exchanges across the network, we assume each node communicates only with a set of neighboring nodes. We show that the communication delay for a single round of exchanging gradients on all the links throughout the network scales as $mathcalOleft(fracn2-3betabetalog nright)$.
arXiv Detail & Related papers (2020-02-10T20:10:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.