Related papers: Communication-Efficient Federated Distillation

Communication-Efficient Federated Distillation

URL: http://arxiv.org/abs/2012.00632v1
Date: Tue, 1 Dec 2020 16:57:25 GMT
Title: Communication-Efficient Federated Distillation
Authors: Felix Sattler and Arturo Marban and Roman Rischke and Wojciech Samek
Abstract summary: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set.
Score: 14.10627556244287
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems. Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning with fundamentally different communication properties, emerged. FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set, between the central server and the participating clients. While for conventional Federated Learning algorithms, like Federated Averaging (FA), communication scales with the size of the jointly trained model, in FD communication scales with the distillation data set size, resulting in advantageous communication properties, especially when large models are trained. In this work, we investigate FD from the perspective of communication efficiency by analyzing the effects of active distillation-data curation, soft-label quantization and delta-coding techniques. Based on the insights gathered from this analysis, we present Compressed Federated Distillation (CFD), an efficient Federated Distillation method. Extensive experiments on Federated image classification and language modeling problems demonstrate that our method can reduce the amount of communication necessary to achieve fixed performance targets by more than two orders of magnitude, when compared to FD and by more than four orders of magnitude when compared with FA.

Related papers

Embedded Federated Feature Selection with Dynamic Sparse Training: Balancing Accuracy-Cost Tradeoffs [1.749521391198341]
We present textitDynamic Sparse Federated Feature Selection (DSFFS), the first innovative embedded FFS that is efficient in both communication and computation. During training, input-layer neurons, their connections, and hidden-layer connections are dynamically pruned and regrown, eliminating uninformative features. Several experiments are conducted on nine real-world datasets, including biology, image, speech, and text.
arXiv Detail & Related papers (2025-04-07T16:33:05Z)
Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models. Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z)
FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation [0.361593752383807]
We introduce FedFT (federated frequency-space transformation), a simple yet effective methodology for communicating model parameters in a Federated Learning setting. FedFT uses Discrete Cosine Transform (DCT) to represent model parameters in frequency space, enabling efficient compression and reducing communication overhead. We demonstrate the generalisability of the FedFT methodology on four datasets using comparative studies with three state-of-the-art FL baselines.
arXiv Detail & Related papers (2024-09-08T23:05:35Z)
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity [50.44021981013037]
Federated Learning (FL) relies on the effectiveness of utilizing knowledge from distributed datasets. Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round. We introduce FedAF, a novel aggregation-free FL algorithm.
arXiv Detail & Related papers (2024-04-29T05:55:23Z)
Federated Distillation: A Survey [73.08661634882195]
Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. The integration of knowledge distillation into FL has been proposed, forming what is known as Federated Distillation (FD) FD enables more flexible knowledge transfer between clients and the server, surpassing the mere sharing of model parameters.
arXiv Detail & Related papers (2024-04-02T03:42:18Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT) Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
Communication-Efficient Federated Distillation with Active Data Sampling [6.516631577963641]
Federated learning (FL) is a promising paradigm to enable privacy-preserving deep learning from distributed data. Federated Distillation (FD) is a recently proposed alternative to enable communication-efficient and robust FL. This paper presents a generic meta-algorithm for FD and investigate the influence of key parameters through empirical experiments. We propose a communication-efficient FD algorithm with active data sampling to improve the model performance and reduce the communication overhead.
arXiv Detail & Related papers (2022-03-14T07:50:55Z)
CosSGD: Nonlinear Quantization for Communication-efficient Federated Learning [62.65937719264881]
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. We propose a nonlinear quantization for compressed gradient descent, which can be easily utilized in federated learning. Our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process.
arXiv Detail & Related papers (2020-12-15T12:20:28Z)
Communication-Efficient Federated Learning with Compensated Overlap-FedAvg [22.636184975591004]
Federated learning is proposed to perform model training by multiple clients' combined data without the dataset sharing within the cluster. We propose Overlap-FedAvg, a framework that parallels the model training phase with model uploading & downloading phase. Overlap-FedAvg is further developed with a hierarchical computing strategy, a data compensation mechanism and a nesterov accelerated gradients(NAG) algorithm.
arXiv Detail & Related papers (2020-12-12T02:50:09Z)
Federated Knowledge Distillation [42.87991207898215]
Federated distillation (FD) is a distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes. This chapter provides a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks. The second part elaborates on a baseline implementation of FD for a classification task, and illustrates its performance in terms of accuracy and communication efficiency compared to FL. The third part presents two selected applications, namely FD over asymmetric uplink-and-downlink wireless channels and FD for reinforcement learning.
arXiv Detail & Related papers (2020-11-04T15:56:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.