Related papers: Heterogeneous Federated Learning Using Knowledge Codistillation

Heterogeneous Federated Learning Using Knowledge Codistillation

URL: http://arxiv.org/abs/2310.02549v1
Date: Wed, 4 Oct 2023 03:17:26 GMT
Title: Heterogeneous Federated Learning Using Knowledge Codistillation
Authors: Jared Lichtarge and Ehsan Amid and Shankar Kumar and Tien-Ju Yang and Rohan Anil and Rajiv Mathews
Abstract summary: We propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters.
Score: 23.895665011884102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters. We present two variants of our method, which improve upon federated averaging on image classification and language modeling tasks. We show this technique can be useful even if only out-of-domain or limited in-domain distillation data is available. Additionally, the bi-directional knowledge distillation allows for domain transfer between the models when different pool populations introduce domain shift.

Related papers

Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging [23.44999968321367]
Soup-of-Experts can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly.
arXiv Detail & Related papers (2025-02-03T20:33:20Z)
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning [64.1745161657794]
Domain-Incremental Learning (DIL) involves the progressive adaptation of a model to new concepts across different domains. Recent advances in pre-trained models provide a solid foundation for DIL. However, learning new concepts often results in the catastrophic forgetting of pre-trained knowledge. We propose DUal ConsolidaTion (Duct) to unify and consolidate historical knowledge.
arXiv Detail & Related papers (2024-10-01T17:58:06Z)
Cross-Domain Content Generation with Domain-Specific Small Language Models [3.2772349789781616]
This study explores methods to enable a small language model to produce coherent and relevant outputs for two different domains. We find that utilizing custom tokenizers tailored to each dataset significantly enhances generation quality. Our findings demonstrate that knowledge expansion with frozen layers is an effective method for small language models to generate domain-specific content.
arXiv Detail & Related papers (2024-09-19T21:45:13Z)
Federated Learning with a Single Shared Image [25.313809311019696]
Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset. In this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image.
arXiv Detail & Related papers (2024-06-18T14:26:09Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
Domain Discrepancy Aware Distillation for Model Aggregation in Federated Learning [47.87639746826555]
We describe two challenges, server-to-client discrepancy and client-to-client discrepancy, brought to the aggregation model by the domain discrepancies. We propose an adaptive knowledge aggregation algorithm FedD3A based on domain discrepancy aware distillation to lower the bound.
arXiv Detail & Related papers (2022-10-04T04:08:16Z)
Federated Learning of Neural ODE Models with Different Iteration Counts [0.9444784653236158]
Federated learning is a distributed machine learning approach in which clients train models locally with their own data and upload them to a server so that their trained results are shared between them without uploading raw data to the server. In this paper, we utilize Neural ODE based models for federated learning. We show that our approach can reduce communication size by up to 92.4% compared with a baseline ResNet model using CIFAR-10 dataset.
arXiv Detail & Related papers (2022-08-19T17:57:32Z)
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression [53.90578309960526]
Large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. We propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information.
arXiv Detail & Related papers (2021-10-16T11:23:02Z)
FedKD: Communication Efficient Federated Learning via Knowledge Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data. In federated learning, clients need to communicate their local model updates in each iteration of model learning. We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z)
GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world. In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting. In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z)
Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.