AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression
- URL: http://arxiv.org/abs/2404.05919v1
- Date: Tue, 9 Apr 2024 00:43:45 GMT
- Title: AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression
- Authors: Sai Aparna Aketi, Abolfazl Hashemi, Kaushik Roy,
- Abstract summary: AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents.
Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
- Score: 11.290935303784208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature. Most of these algorithms introduce an additional hyper-parameter referred to as consensus step-size which is tuned based on the compression ratio at the beginning of the training. In this work, we propose AdaGossip, a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. We demonstrate the effectiveness of the proposed method through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance ($0-2\%$ improvement in test accuracy) compared to the current state-of-the-art method for decentralized learning with communication compression.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Global Update Tracking: A Decentralized Learning Algorithm for
Heterogeneous Data [14.386062807300666]
In this paper, we focus on designing a decentralized learning algorithm that is less susceptible to variations in data distribution across devices.
We propose Global Update Tracking (GUT), a novel tracking-based method that aims to mitigate the impact of heterogeneous data in decentralized learning without introducing any communication overhead.
Our experiments show that the proposed method achieves state-of-the-art performance for decentralized learning on heterogeneous data via a $1-6%$ improvement in test accuracy compared to other existing techniques.
arXiv Detail & Related papers (2023-05-08T15:48:53Z) - Communication-Compressed Adaptive Gradient Method for Distributed
Nonconvex Optimization [21.81192774458227]
One of the major bottlenecks is the large communication cost between the central server and the local workers.
Our proposed distributed learning framework features an effective gradient gradient compression strategy.
arXiv Detail & Related papers (2021-11-01T04:54:55Z) - Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Decentralized Deep Learning using Momentum-Accelerated Consensus [15.333413663982874]
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset.
We propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology.
Our algorithm is based on the heavy-ball acceleration method used in gradient-based protocol.
arXiv Detail & Related papers (2020-10-21T17:39:52Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.