Related papers: Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

URL: http://arxiv.org/abs/2405.20988v3
Date: Fri, 18 Oct 2024 08:05:18 GMT
Title: Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Authors: Michail Theologitis, Georgios Frangias, Georgios Anestis, Vasilis Samoladas, Antonios Deligiannakis,
Abstract summary: Federated Dynamic Averaging (FDA) is a communication-efficient DDL strategy. FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge algorithms.
Score: 1.4748100900619232
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Driven by the ever-growing volume and decentralized nature of data, coupled with the need to harness this data and generate knowledge from it, has led to the extensive use of distributed deep learning (DDL) techniques for training. These techniques rely on local training that is performed at the distributed nodes based on locally collected data, followed by a periodic synchronization process that combines these models to create a global model. However, frequent synchronization of DL models, encompassing millions to many billions of parameters, creates a communication bottleneck, severely hindering scalability. Worse yet, DDL algorithms typically waste valuable bandwidth, and make themselves less practical in bandwidth-constrained federated settings, by relying on overly simplistic, periodic, and rigid synchronization schedules. These drawbacks also have a direct impact on the time required for the training process, necessitating excessive time for data communication. To address these shortcomings, we propose Federated Dynamic Averaging (FDA), a communication-efficient DDL strategy that dynamically triggers synchronization based on the value of the model variance. In essence, the costly synchronization step is triggered only if the local models, which are initialized from a common global model after each synchronization, have significantly diverged. This decision is facilitated by the communication of a small local state from each distributed node/worker. Through extensive experiments across a wide range of learning tasks we demonstrate that FDA reduces communication cost by orders of magnitude, compared to both traditional and cutting-edge communication-efficient algorithms. Additionally, we show that FDA maintains robust performance across diverse data heterogeneity settings.

Related papers

Embedded Federated Feature Selection with Dynamic Sparse Training: Balancing Accuracy-Cost Tradeoffs [1.749521391198341]
We present textitDynamic Sparse Federated Feature Selection (DSFFS), the first innovative embedded FFS that is efficient in both communication and computation.<n>During training, input-layer neurons, their connections, and hidden-layer connections are dynamically pruned and regrown, eliminating uninformative features.<n>Several experiments are conducted on nine real-world datasets, including biology, image, speech, and text.
arXiv Detail & Related papers (2025-04-07T16:33:05Z)
FedECADO: A Dynamical System Model of Federated Learning [15.425099636035108]
Federated learning harnesses the power of distributed optimization to train a unified machine learning model across separate clients. This work proposes FedECADO, a new algorithm inspired by a dynamical system representation of the federated learning process. Compared to prominent techniques, including FedProx and FedNova, FedECADO achieves higher classification accuracies in numerous heterogeneous scenarios.
arXiv Detail & Related papers (2024-10-13T17:26:43Z)
High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model. In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z)
Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z)
Federated Learning based on Pruning and Recovery [0.0]
This framework integrates asynchronous learning algorithms and pruning techniques. It addresses the inefficiencies of traditional federated learning algorithms in scenarios involving heterogeneous devices. It also tackles the staleness issue and inadequate training of certain clients in asynchronous algorithms.
arXiv Detail & Related papers (2024-03-16T14:35:03Z)
Ravnest: Decentralized Asynchronous Training on Heterogeneous Devices [0.0]
Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates.
arXiv Detail & Related papers (2024-01-03T13:07:07Z)
EvoFed: Leveraging Evolutionary Strategies for Communication-Efficient Federated Learning [15.124439914522693]
Federated Learning (FL) is a decentralized machine learning paradigm that enables collaborative model training across dispersed nodes. This paper presents EvoFed, a novel approach that integrates Evolutionary Strategies (ES) with FL to address these challenges.
arXiv Detail & Related papers (2023-11-13T17:25:06Z)
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method. We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate. We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z)
FedCL: Federated Multi-Phase Curriculum Learning to Synchronously Correlate User Heterogeneity [17.532659808426605]
Federated Learning (FL) is a decentralized learning method used to train machine learning algorithms. In FL, a global model iteratively collects the parameters of local models without accessing their local data. We propose an active and synchronous correlation approach to address the challenge of user heterogeneity in FL.
arXiv Detail & Related papers (2022-11-14T10:06:41Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
Asynchronous Parallel Incremental Block-Coordinate Descent for Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing. For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data. This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z)
Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity [57.275753974812666]
Federated learning involves learning from data samples distributed across a network of clients while the data remains local. In this paper, we propose a novel straggler-resilient federated learning method that incorporates statistical characteristics of the clients' data to adaptively select the clients in order to speed up the learning procedure.
arXiv Detail & Related papers (2020-12-28T19:21:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.