Federated Knowledge Distillation
- URL: http://arxiv.org/abs/2011.02367v1
- Date: Wed, 4 Nov 2020 15:56:13 GMT
- Title: Federated Knowledge Distillation
- Authors: Hyowoon Seo, Jihong Park, Seungeun Oh, Mehdi Bennis, Seong-Lyun Kim
- Abstract summary: Federated distillation (FD) is a distributed learning solution that only exchanges the model outputs whose dimensions are commonly much smaller than the model sizes.
This chapter provides a deep understanding of FD while demonstrating its communication efficiency and applicability to a variety of tasks.
The second part elaborates on a baseline implementation of FD for a classification task, and illustrates its performance in terms of accuracy and communication efficiency compared to FL.
The third part presents two selected applications, namely FD over asymmetric uplink-and-downlink wireless channels and FD for reinforcement learning.
- Score: 42.87991207898215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed learning frameworks often rely on exchanging model parameters
across workers, instead of revealing their raw data. A prime example is
federated learning that exchanges the gradients or weights of each neural
network model. Under limited communication resources, however, such a method
becomes extremely costly particularly for modern deep neural networks having a
huge number of model parameters. In this regard, federated distillation (FD) is
a compelling distributed learning solution that only exchanges the model
outputs whose dimensions are commonly much smaller than the model sizes (e.g.,
10 labels in the MNIST dataset). The goal of this chapter is to provide a deep
understanding of FD while demonstrating its communication efficiency and
applicability to a variety of tasks. To this end, towards demystifying the
operational principle of FD, the first part of this chapter provides a novel
asymptotic analysis for two foundational algorithms of FD, namely knowledge
distillation (KD) and co-distillation (CD), by exploiting the theory of neural
tangent kernel (NTK). Next, the second part elaborates on a baseline
implementation of FD for a classification task, and illustrates its performance
in terms of accuracy and communication efficiency compared to FL. Lastly, to
demonstrate the applicability of FD to various distributed learning tasks and
environments, the third part presents two selected applications, namely FD over
asymmetric uplink-and-downlink wireless channels and FD for reinforcement
learning.
Related papers
- Federated Distillation: A Survey [73.08661634882195]
Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients.
The integration of knowledge distillation into FL has been proposed, forming what is known as Federated Distillation (FD)
FD enables more flexible knowledge transfer between clients and the server, surpassing the mere sharing of model parameters.
arXiv Detail & Related papers (2024-04-02T03:42:18Z) - Logits Poisoning Attack in Federated Distillation [8.728629314547248]
We introduce FDLA, a poisoning attack method tailored for Federated Distillation (FD)
We demonstrate that LPA effectively compromises client model accuracy, outperforming established baseline algorithms in this regard.
Our findings underscore the critical need for robust defense mechanisms in FD settings to mitigate such adversarial threats.
arXiv Detail & Related papers (2024-01-08T06:18:46Z) - Convergence Visualizer of Decentralized Federated Distillation with
Reduced Communication Costs [3.2098126952615442]
Federated learning (FL) achieves collaborative learning without the need for data sharing, thus preventing privacy leakage.
This study solves two unresolved challenges of CMFD: (1) communication cost reduction and (2) visualization of model convergence.
arXiv Detail & Related papers (2023-12-19T07:23:49Z) - Assessing Neural Network Representations During Training Using
Noise-Resilient Diffusion Spectral Entropy [55.014926694758195]
Entropy and mutual information in neural networks provide rich information on the learning process.
We leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures.
We show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data.
arXiv Detail & Related papers (2023-12-04T01:32:42Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - Multi-Branch Deep Radial Basis Function Networks for Facial Emotion
Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units.
RBF units capture local patterns shared by similar instances using an intermediate representation.
We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z) - A Generalizable Model-and-Data Driven Approach for Open-Set RFF
Authentication [74.63333951647581]
Radio-frequency fingerprints(RFFs) are promising solutions for realizing low-cost physical layer authentication.
Machine learning-based methods have been proposed for RFF extraction and discrimination.
We propose a new end-to-end deep learning framework for extracting RFFs from raw received signals.
arXiv Detail & Related papers (2021-08-10T03:59:37Z) - A Physics-Informed Deep Learning Paradigm for Traffic State Estimation
and Fundamental Diagram Discovery [3.779860024918729]
This paper contributes an improved paradigm, called physics-informed deep learning with a fundamental diagram learner (PIDL+FDL)
PIDL+FDL integrates ML terms into the model-driven component to learn a functional form of a fundamental diagram (FD), i.e., a mapping from traffic density to flow or velocity.
We demonstrate the use of PIDL+FDL to solve popular first-order and second-order traffic flow models and reconstruct the FD relation.
arXiv Detail & Related papers (2021-06-06T14:54:32Z) - Communication-Efficient Federated Distillation [14.10627556244287]
Communication constraints are one of the major challenges preventing the wide-spread adoption of Federated Learning systems.
Recently, Federated Distillation (FD), a new algorithmic paradigm for Federated Learning, emerged.
FD methods leverage ensemble distillation techniques and exchange model outputs, presented as soft labels on an unlabeled public data set.
arXiv Detail & Related papers (2020-12-01T16:57:25Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.