Heterogeneous Federated Learning Using Knowledge Codistillation
- URL: http://arxiv.org/abs/2310.02549v1
- Date: Wed, 4 Oct 2023 03:17:26 GMT
- Title: Heterogeneous Federated Learning Using Knowledge Codistillation
- Authors: Jared Lichtarge and Ehsan Amid and Shankar Kumar and Tien-Ju Yang and
Rohan Anil and Rajiv Mathews
- Abstract summary: We propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity.
The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters.
- Score: 23.895665011884102
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Averaging, and many federated learning algorithm variants which
build upon it, have a limitation: all clients must share the same model
architecture. This results in unused modeling capacity on many clients, which
limits model performance. To address this issue, we propose a method that
involves training a small model on the entire pool and a larger model on a
subset of clients with higher capacity. The models exchange information
bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a
server without sharing parameters. We present two variants of our method, which
improve upon federated averaging on image classification and language modeling
tasks. We show this technique can be useful even if only out-of-domain or
limited in-domain distillation data is available. Additionally, the
bi-directional knowledge distillation allows for domain transfer between the
models when different pool populations introduce domain shift.
Related papers
- Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains.
We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality.
Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z) - Federated Learning with a Single Shared Image [25.313809311019696]
Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data.
One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset.
In this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image.
arXiv Detail & Related papers (2024-06-18T14:26:09Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Domain Discrepancy Aware Distillation for Model Aggregation in Federated
Learning [47.87639746826555]
We describe two challenges, server-to-client discrepancy and client-to-client discrepancy, brought to the aggregation model by the domain discrepancies.
We propose an adaptive knowledge aggregation algorithm FedD3A based on domain discrepancy aware distillation to lower the bound.
arXiv Detail & Related papers (2022-10-04T04:08:16Z) - Federated Learning of Neural ODE Models with Different Iteration Counts [0.9444784653236158]
Federated learning is a distributed machine learning approach in which clients train models locally with their own data and upload them to a server so that their trained results are shared between them without uploading raw data to the server.
In this paper, we utilize Neural ODE based models for federated learning.
We show that our approach can reduce communication size by up to 92.4% compared with a baseline ResNet model using CIFAR-10 dataset.
arXiv Detail & Related papers (2022-08-19T17:57:32Z) - HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain
Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods.
We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z) - FedKD: Communication Efficient Federated Learning via Knowledge
Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data.
In federated learning, clients need to communicate their local model updates in each iteration of model learning.
We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z) - GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world.
In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting.
In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.