Consensus Control for Decentralized Deep Learning
- URL: http://arxiv.org/abs/2102.04828v1
- Date: Tue, 9 Feb 2021 13:58:33 GMT
- Title: Consensus Control for Decentralized Deep Learning
- Authors: Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian
U. Stich
- Abstract summary: Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
- Score: 72.50487751271069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized training of deep learning models enables on-device learning
over networks, as well as efficient scaling to large compute clusters.
Experiments in earlier works reveal that, even in a data-center setup,
decentralized training often suffers from the degradation in the quality of the
model: the training and test performance of models trained in a decentralized
fashion is in general worse than that of models trained in a centralized
fashion, and this performance drop is impacted by parameters such as network
size, communication topology and data partitioning.
We identify the changing consensus distance between devices as a key
parameter to explain the gap between centralized and decentralized training. We
show in theory that when the training consensus distance is lower than a
critical quantity, decentralized training converges as fast as the centralized
counterpart. We empirically validate that the relation between generalization
performance and consensus distance is consistent with this theoretical
observation. Our empirical insights allow the principled design of better
decentralized training schemes that mitigate the performance drop. To this end,
we propose practical training guidelines for the data-center setup as the
important first step.
Related papers
- From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce.
This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression [11.290935303784208]
AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents.
Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
arXiv Detail & Related papers (2024-04-09T00:43:45Z) - Initialisation and Network Effects in Decentralised Federated Learning [1.5961625979922607]
Decentralised federated learning enables collaborative training of individual machine learning models on a distributed network of communicating devices.
This approach avoids central coordination, enhances data privacy and eliminates the risk of a single point of failure.
We propose a strategy for uncoordinated initialisation of the artificial neural networks based on the distribution of eigenvector centralities of the underlying communication network.
arXiv Detail & Related papers (2024-03-23T14:24:36Z) - Scheduling and Communication Schemes for Decentralized Federated
Learning [0.31410859223862103]
A decentralized federated learning (DFL) model with the gradient descent (SGD) algorithm has been introduced.
Three scheduling policies for DFL have been proposed for communications between the clients and the parallel servers.
Results show that the proposed scheduling polices have an impact both on the speed of convergence and in the final global model.
arXiv Detail & Related papers (2023-11-27T17:35:28Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Towards More Suitable Personalization in Federated Learning via
Decentralized Partial Model Training [67.67045085186797]
Almost all existing systems have to face large communication burdens if the central FL server fails.
It personalizes the "right" in the deep models by alternately updating the shared and personal parameters.
To further promote the shared parameters aggregation process, we propose DFed integrating the local Sharpness Miniization.
arXiv Detail & Related papers (2023-05-24T13:52:18Z) - Event-Triggered Decentralized Federated Learning over
Resource-Constrained Edge Devices [12.513477328344255]
Federated learning (FL) is a technique for distributed machine learning (ML)
In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation.
We develop a novel methodology for fully decentralized FL, where devices conduct model aggregation via cooperative consensus formation.
arXiv Detail & Related papers (2022-11-23T00:04:05Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.