Related papers: Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection

URL: http://arxiv.org/abs/2002.09692v1
Date: Sat, 22 Feb 2020 12:31:57 GMT
Title: Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
Authors: Zhenheng Tang, Shaohuai Shi, Xiaowen Chu
Abstract summary: We introduce a novel decentralized training algorithm with the following key features. Each worker only needs to communicate with a single peer at each communication round with a highly compressed model. Experimental results show that our algorithm significantly reduces the communication traffic and generally selects relatively high bandwidth peers.
Score: 13.963329236804586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple low-bandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.

Related papers

Sparsity-Aware Communication for Distributed Graph Neural Network Training [0.41942958779358674]
Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. GNN training has low computational intensity, making communication costs the bottleneck for scalability. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches.
arXiv Detail & Related papers (2025-04-07T01:53:14Z)
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch [66.84195842685459]
Training of large language models (LLMs) is typically distributed across a large number of accelerators to reduce training time. Recently, distributed algorithms like DiLoCo have relaxed such co-location constraint. We show experimentally that we can distribute training of billion-scale parameters and reach similar quality as before.
arXiv Detail & Related papers (2025-01-30T17:23:50Z)
Communication-Efficient Federated Learning with Adaptive Compression under Dynamic Bandwidth [6.300376113680886]
Federated learning can train models without directly providing local data to the server. Recent scholars have achieved the communication efficiency of federated learning mainly by model compression. We show the performance of AdapComFL algorithm, and compare it with existing algorithms.
arXiv Detail & Related papers (2024-05-06T08:00:43Z)
Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing [52.402550431781805]
Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging. We develop a novel algorithm based on the framework of the inexact alternating direction method (iADM)
arXiv Detail & Related papers (2023-08-31T12:22:40Z)
Provably Doubly Accelerated Federated Learning: The First Theoretically Successful Combination of Local Training and Compressed Communication [7.691755449724637]
We propose the first algorithm for distributed optimization and federated learning. Our algorithm converges linearly to an exact solution, with a doubly accelerated rate.
arXiv Detail & Related papers (2022-10-24T14:13:54Z)
Optimising Communication Overhead in Federated Learning Using NSGA-II [6.635754265968436]
This work aims at optimising communication overhead in federated learning by (I) modelling it as a multi-objective problem and (II) applying a multi-objective optimization algorithm (NSGA-II) to solve it. Experiments have shown that our proposal could reduce communication by 99% and maintain an accuracy equal to the one obtained by the FedAvg Algorithm that uses 100% of communications.
arXiv Detail & Related papers (2022-04-01T18:06:20Z)
RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z)
Secure Distributed Training at Scale [65.7538150168154]
Training in presence of peers requires specialized distributed training algorithms with Byzantine tolerance. We propose a novel protocol for secure (Byzantine-tolerant) decentralized training that emphasizes communication efficiency.
arXiv Detail & Related papers (2021-06-21T17:00:42Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
A Low Complexity Decentralized Neural Net with Centralized Equivalence using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers) In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns. We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.