Related papers: Consensus Control for Decentralized Deep Learning

Consensus Control for Decentralized Deep Learning

URL: http://arxiv.org/abs/2102.04828v1
Date: Tue, 9 Feb 2021 13:58:33 GMT
Title: Consensus Control for Decentralized Deep Learning
Authors: Lingjing Kong, Tao Lin, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich
Abstract summary: Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters. We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart. Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
Score: 72.50487751271069
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters. Experiments in earlier works reveal that, even in a data-center setup, decentralized training often suffers from the degradation in the quality of the model: the training and test performance of models trained in a decentralized fashion is in general worse than that of models trained in a centralized fashion, and this performance drop is impacted by parameters such as network size, communication topology and data partitioning. We identify the changing consensus distance between devices as a key parameter to explain the gap between centralized and decentralized training. We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart. We empirically validate that the relation between generalization performance and consensus distance is consistent with this theoretical observation. Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop. To this end, we propose practical training guidelines for the data-center setup as the important first step.

Related papers

Scaling Up Data Parallelism in Decentralized Deep Learning [6.059539855453347]
Decentralized learning is not yet green-lighted for production use, largely due to a lack of stability, scalability, and generality in large scale DNN training.<n>We propose Ada, a decentralized adaptive approach that performs large scale DNN training following a decentralized SGD method and adapting the communication graph in use dynamically throughout training.
arXiv Detail & Related papers (2025-08-31T17:34:52Z)
A Single Merging Suffices: Recovering Server-based Learning Performance in Decentralized Learning [17.386971981099588]
We study how communication should be scheduled over time, including determining when and how frequently devices synchronize.<n>We find that fully connected communication at the final step, implemented by a single global merging, is sufficient to match the performance of server-based training.
arXiv Detail & Related papers (2025-07-09T04:56:56Z)
From promise to practice: realizing high-performance decentralized training [8.955918346078935]
Decentralized training of deep neural networks has attracted significant attention for its theoretically superior scalability over synchronous data-parallel methods like All-Reduce. This paper identifies three key factors that can lead to speedups over All-Reduce training and constructs a runtime model to determine when, how, and to what degree decentralization can yield shorter per-it runtimes.
arXiv Detail & Related papers (2024-10-15T19:04:56Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression [11.290935303784208]
AdaGossip is a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. Our experiments show that the proposed method achieves superior performance compared to the current state-of-the-art method for decentralized learning with communication compression.
arXiv Detail & Related papers (2024-04-09T00:43:45Z)
Initialisation and Network Effects in Decentralised Federated Learning [1.5961625979922607]
Decentralised federated learning enables collaborative training of individual machine learning models on a distributed network of communicating devices. This approach avoids central coordination, enhances data privacy and eliminates the risk of a single point of failure. We propose a strategy for uncoordinated initialisation of the artificial neural networks based on the distribution of eigenvector centralities of the underlying communication network.
arXiv Detail & Related papers (2024-03-23T14:24:36Z)
Scheduling and Communication Schemes for Decentralized Federated Learning [0.31410859223862103]
A decentralized federated learning (DFL) model with the gradient descent (SGD) algorithm has been introduced. Three scheduling policies for DFL have been proposed for communications between the clients and the parallel servers. Results show that the proposed scheduling polices have an impact both on the speed of convergence and in the final global model.
arXiv Detail & Related papers (2023-11-27T17:35:28Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Towards More Suitable Personalization in Federated Learning via Decentralized Partial Model Training [67.67045085186797]
Almost all existing systems have to face large communication burdens if the central FL server fails. It personalizes the "right" in the deep models by alternately updating the shared and personal parameters. To further promote the shared parameters aggregation process, we propose DFed integrating the local Sharpness Miniization.
arXiv Detail & Related papers (2023-05-24T13:52:18Z)
Event-Triggered Decentralized Federated Learning over Resource-Constrained Edge Devices [12.513477328344255]
Federated learning (FL) is a technique for distributed machine learning (ML) In traditional FL algorithms, trained models at the edge are periodically sent to a central server for aggregation. We develop a novel methodology for fully decentralized FL, where devices conduct model aggregation via cooperative consensus formation.
arXiv Detail & Related papers (2022-11-23T00:04:05Z)
Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers. To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)
Quantized Decentralized Stochastic Learning over Directed Graphs [52.94011236627326]
We consider a decentralized learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the communication load due to each node transmitting messages (model updates) to its neighbors. We propose the quantized decentralized learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization.
arXiv Detail & Related papers (2020-02-23T18:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.