Faster Convergence with Less Communication: Broadcast-Based Subgraph
Sampling for Decentralized Learning over Wireless Networks
- URL: http://arxiv.org/abs/2401.13779v1
- Date: Wed, 24 Jan 2024 20:00:23 GMT
- Title: Faster Convergence with Less Communication: Broadcast-Based Subgraph
Sampling for Decentralized Learning over Wireless Networks
- Authors: Daniel P\'erez Herrera, Zheng Chen, and Erik G. Larsson
- Abstract summary: $texttBASS$ is a broadcast-based subgraph sampling method designed to accelerate the convergence of D-SGD.
We show that $texttBASS$ enables faster convergence with fewer transmission slots compared to existing link-based scheduling methods.
- Score: 32.914407967052114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consensus-based decentralized stochastic gradient descent (D-SGD) is a widely
adopted algorithm for decentralized training of machine learning models across
networked agents. A crucial part of D-SGD is the consensus-based model
averaging, which heavily relies on information exchange and fusion among the
nodes. Specifically, for consensus averaging over wireless networks,
communication coordination is necessary to determine when and how a node can
access the channel and transmit (or receive) information to (or from) its
neighbors. In this work, we propose $\texttt{BASS}$, a broadcast-based subgraph
sampling method designed to accelerate the convergence of D-SGD while
considering the actual communication cost per iteration. $\texttt{BASS}$
creates a set of mixing matrix candidates that represent sparser subgraphs of
the base topology. In each consensus iteration, one mixing matrix is sampled,
leading to a specific scheduling decision that activates multiple
collision-free subsets of nodes. The sampling occurs in a probabilistic manner,
and the elements of the mixing matrices, along with their sampling
probabilities, are jointly optimized. Simulation results demonstrate that
$\texttt{BASS}$ enables faster convergence with fewer transmission slots
compared to existing link-based scheduling methods. In conclusion, the inherent
broadcasting nature of wireless channels offers intrinsic advantages in
accelerating the convergence of decentralized optimization and learning.
Related papers
- DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Decentralized Learning over Wireless Networks with Broadcast-Based
Subgraph Sampling [36.99249604183772]
This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized descent (D-SGD)
Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot.
We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling.
arXiv Detail & Related papers (2023-10-24T18:15:52Z) - Communication-Efficient Decentralized Federated Learning via One-Bit
Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications.
Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging.
We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z) - DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates [4.3707341422218215]
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning.
We design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST.
We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20.
arXiv Detail & Related papers (2023-07-14T22:58:20Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes.
An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps.
This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z) - Communication-Efficient Sampling for Distributed Training of Graph
Convolutional Networks [3.075766050800645]
Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data from neighboring nodes.
Previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors.
We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training.
arXiv Detail & Related papers (2021-01-19T16:12:44Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - A Compressive Sensing Approach for Federated Learning over Massive MIMO
Communication Systems [82.2513703281725]
Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices.
We present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems.
arXiv Detail & Related papers (2020-03-18T05:56:27Z) - Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant)
We achieve this by adding an anchor model on each node.
After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.