Consensus Control for Decentralized Deep Learning
- URL: http://arxiv.org/abs/2102.04828v1
- Date: Tue, 9 Feb 2021 13:58:33 GMT
- Title: Consensus Control for Decentralized Deep Learning
- Authors: Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian
U. Stich
- Abstract summary: Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters.
We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart.
Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
- Score: 72.50487751271069
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized training of deep learning models enables on-device learning
over networks, as well as efficient scaling to large compute clusters.
Experiments in earlier works reveal that, even in a data-center setup,
decentralized training often suffers from the degradation in the quality of the
model: the training and test performance of models trained in a decentralized
fashion is in general worse than that of models trained in a centralized
fashion, and this performance drop is impacted by parameters such as network
size, communication topology and data partitioning.
We identify the changing consensus distance between devices as a key
parameter to explain the gap between centralized and decentralized training. We
show in theory that when the training consensus distance is lower than a
critical quantity, decentralized training converges as fast as the centralized
counterpart. We empirically validate that the relation between generalization
performance and consensus distance is consistent with this theoretical
observation. Our empirical insights allow the principled design of better
decentralized training schemes that mitigate the performance drop. To this end,
we propose practical training guidelines for the data-center setup as the
important first step.
Related papers
- AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression [11.290935303784208]
AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents.
Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
arXiv Detail & Related papers (2024-04-09T00:43:45Z) - Initialisation and Topology Effects in Decentralised Federated Learning [1.5961625979922607]
Decentralised federated learning enables collaborative training of individual machine learning models on distributed devices on a communication network.
This approach enhances data privacy and eliminates both the single point of failure and the necessity for central coordination.
We propose a strategy for uncoordinated initialisation of the artificial neural networks.
arXiv Detail & Related papers (2024-03-23T14:24:36Z) - Scheduling and Communication Schemes for Decentralized Federated
Learning [0.31410859223862103]
A decentralized federated learning (DFL) model with the gradient descent (SGD) algorithm has been introduced.
Three scheduling policies for DFL have been proposed for communications between the clients and the parallel servers.
Results show that the proposed scheduling polices have an impact both on the speed of convergence and in the final global model.
arXiv Detail & Related papers (2023-11-27T17:35:28Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Towards More Suitable Personalization in Federated Learning via
Decentralized Partial Model Training [67.67045085186797]
Almost all existing systems have to face large communication burdens if the central FL server fails.
It personalizes the "right" in the deep models by alternately updating the shared and personal parameters.
To further promote the shared parameters aggregation process, we propose DFed integrating the local Sharpness Miniization.
arXiv Detail & Related papers (2023-05-24T13:52:18Z) - Event-Triggered Decentralized Federated Learning over
Resource-Constrained Edge Devices [12.513477328344255]
Federated learning (FL) is a technique for distributed machine learning (ML)
In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation.
We develop a novel methodology for fully decentralized FL, where devices conduct model aggregation via cooperative consensus formation.
arXiv Detail & Related papers (2022-11-23T00:04:05Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - Decentralized Deep Learning using Momentum-Accelerated Consensus [15.333413663982874]
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset.
We propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology.
Our algorithm is based on the heavy-ball acceleration method used in gradient-based protocol.
arXiv Detail & Related papers (2020-10-21T17:39:52Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z) - Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph.
As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors.
We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.