Communication-Efficient Federated Distillation
- URL: http://arxiv.org/abs/2012.00632v1
- Date: Tue, 1 Dec 2020 16:57:25 GMT
- Title: Communication-Efficient Federated Distillation
- Authors: Felix Sattler and Arturo Marban and Roman Rischke and Wojciech Samek
- Abstract summary: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems.
Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning, emerged.
FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set.
- Score: 14.10627556244287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communication constraints are one of the major challenges preventing the
wide-spread adoption of Federated Learning systems. Recently, Federated
Distillation (FD), a new algorithmic paradigm for Federated Learning with
fundamentally different communication properties, emerged. FD methods leverage
ensemble distillation techniques and exchange model outputs, presented as soft
labels on an unlabeled public data set, between the central server and the
participating clients. While for conventional Federated Learning algorithms,
like Federated Averaging (FA), communication scales with the size of the
jointly trained model, in FD communication scales with the distillation data
set size, resulting in advantageous communication properties, especially when
large models are trained. In this work, we investigate FD from the perspective
of communication efficiency by analyzing the effects of active
distillation-data curation, soft-label quantization and delta-coding
techniques. Based on the insights gathered from this analysis, we present
Compressed Federated Distillation (CFD), an efficient Federated Distillation
method. Extensive experiments on Federated image classification and language
modeling problems demonstrate that our method can reduce the amount of
communication necessary to achieve fixed performance targets by more than two
orders of magnitude, when compared to FD and by more than four orders of
magnitude when compared with FA.
Related papers
- FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation [0.361593752383807]
We introduce FedFT (federated frequency-space transformation), a simple yet effective methodology for communicating model parameters in a Federated Learning setting.
FedFT uses Discrete Cosine Transform (DCT) to represent model parameters in frequency space, enabling efficient compression and reducing communication overhead.
We demonstrate the generalisability of the FedFT methodology on four datasets using comparative studies with three state-of-the-art FL baselines.
arXiv Detail & Related papers (2024-09-08T23:05:35Z) - An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets.
Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round.
We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z) - Federated Distillation: A Survey [73.08661634882195]
Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients.
The integration of knowledge distillation into FL has been proposed, forming what is known as Federated Distillation (FD)
FD enables more flexible knowledge transfer between clients and the server, surpassing the mere sharing of model parameters.
arXiv Detail & Related papers (2024-04-02T03:42:18Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Communication-Efficient Federated Distillation with Active Data Sampling [6.516631577963641]
Federated learning (FL) is a promising paradigm to enable privacy-preserving deep learning from distributed data.
Federated Distillation (FD) is a recently proposed alternative to enable communication-efficient and robust FL.
This paper presents a generic meta-algorithm for FD and investigate the influence of key parameters through empirical experiments.
We propose a communication-efficient FD algorithm with active data sampling to improve the model performance and reduce the communication overhead.
arXiv Detail & Related papers (2022-03-14T07:50:55Z) - CosSGD: Nonlinear Quantization for Communication-efficient Federated
Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server.
We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning.
Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z) - Communication-Efficient Federated Learning with Compensated
Overlap-FedAvg [22.636184975591004]
Federated learning is proposed to perform model training by multiple clients' combined data without the dataset sharing within the cluster.
We propose Overlap-FedAvg, a framework that parallels the model training phase with model uploading & downloading phase.
Overlap-FedAvg is further developed with a hierarchical computing strategy, a data compensation mechanism and a nesterov accelerated gradients(NAG) algorithm.
arXiv Detail & Related papers (2020-12-12T02:50:09Z) - Federated Knowledge Distillation [42.87991207898215]
Federated distillation (FD) is a distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes.
This chapter provides a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks.
The second part elaborates on a baseline implementation of FD for a classification task, and illustrates its performance in terms of accuracy and communication efficiency compared to FL.
The third part presents two selected applications, namely FD over asymmetric uplink-and-downlink wireless channels and FD for reinforcement learning.
arXiv Detail & Related papers (2020-11-04T15:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.