Decentralized Learning Made Easy with DecentralizePy
- URL: http://arxiv.org/abs/2304.08322v1
- Date: Mon, 17 Apr 2023 14:42:33 GMT
- Title: Decentralized Learning Made Easy with DecentralizePy
- Authors: Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Milos
Vujasinovic
- Abstract summary: Decentralized learning (DL) has gained prominence for its potential benefits in terms of scalability, privacy, and fault tolerance.
We propose DecentralizePy, a distributed framework for decentralized ML, which allows for the emulation of large-scale learning networks in arbitrary topologies.
We demonstrate the capabilities of DecentralizePy by deploying techniques such as sparsification and secure aggregation on top of several topologies.
- Score: 3.1848820580333737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized learning (DL) has gained prominence for its potential benefits
in terms of scalability, privacy, and fault tolerance. It consists of many
nodes that coordinate without a central server and exchange millions of
parameters in the inherently iterative process of machine learning (ML)
training. In addition, these nodes are connected in complex and potentially
dynamic topologies. Assessing the intricate dynamics of such networks is
clearly not an easy task. Often in literature, researchers resort to simulated
environments that do not scale and fail to capture practical and crucial
behaviors, including the ones associated to parallelism, data transfer, network
delays, and wall-clock time. In this paper, we propose DecentralizePy, a
distributed framework for decentralized ML, which allows for the emulation of
large-scale learning networks in arbitrary topologies. We demonstrate the
capabilities of DecentralizePy by deploying techniques such as sparsification
and secure aggregation on top of several topologies, including dynamic networks
with more than one thousand nodes.
Related papers
- Impact of Network Topology on Byzantine Resilience in Decentralized Federated Learning [0.0]
This work investigates the effects of state-of-the-art Byzantine-robust aggregation methods in complex, large-scale network structures.
We find that state-of-the-art Byzantine robust aggregation strategies are not resilient within large non-fully connected networks.
arXiv Detail & Related papers (2024-07-06T17:47:44Z) - DRACO: Decentralized Asynchronous Federated Learning over Continuous Row-Stochastic Network Matrices [7.389425875982468]
We propose DRACO, a novel method for decentralized asynchronous Descent (SGD) over row-stochastic gossip wireless networks.
Our approach enables edge devices within decentralized networks to perform local training and model exchanging along a continuous timeline.
Our numerical experiments corroborate the efficacy of the proposed technique.
arXiv Detail & Related papers (2024-06-19T13:17:28Z) - Initialisation and Network Effects in Decentralised Federated Learning [1.5961625979922607]
Decentralised federated learning enables collaborative training of individual machine learning models on a distributed network of communicating devices.
This approach avoids central coordination, enhances data privacy and eliminates the risk of a single point of failure.
We propose a strategy for uncoordinated initialisation of the artificial neural networks based on the distribution of eigenvector centralities of the underlying communication network.
arXiv Detail & Related papers (2024-03-23T14:24:36Z) - Decentralized Training of Foundation Models in Heterogeneous
Environments [77.47261769795992]
Training foundation models, such as GPT-3 and PaLM, can be extremely expensive.
We present the first study of training large foundation models with model parallelism in a decentralized regime over a heterogeneous network.
arXiv Detail & Related papers (2022-06-02T20:19:51Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Decentralized Deep Learning using Momentum-Accelerated Consensus [15.333413663982874]
We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset.
We propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology.
Our algorithm is based on the heavy-ball acceleration method used in gradient-based protocol.
arXiv Detail & Related papers (2020-10-21T17:39:52Z) - A Low Complexity Decentralized Neural Net with Centralized Equivalence
using Layer-wise Learning [49.15799302636519]
We design a low complexity decentralized learning algorithm to train a recently proposed large neural network in distributed processing nodes (workers)
In our setup, the training data is distributed among the workers but is not shared in the training process due to privacy and security concerns.
We show that it is possible to achieve equivalent learning performance as if the data is available in a single place.
arXiv Detail & Related papers (2020-09-29T13:08:12Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Federated Learning with Cooperating Devices: A Consensus Approach for
Massive IoT Networks [8.456633924613456]
Federated learning (FL) is emerging as a new paradigm to train machine learning models in distributed systems.
The paper proposes a fully distributed (or server-less) learning approach: the proposed FL algorithms leverage the cooperation of devices that perform data operations inside the network.
The approach lays the groundwork for integration of FL within 5G and beyond networks characterized by decentralized connectivity and computing.
arXiv Detail & Related papers (2019-12-27T15:16:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.