Ensemble Distillation for Robust Model Fusion in Federated Learning
- URL: http://arxiv.org/abs/2006.07242v3
- Date: Sat, 27 Mar 2021 16:31:56 GMT
- Title: Ensemble Distillation for Robust Model Fusion in Federated Learning
- Authors: Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi
- Abstract summary: Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
- Score: 72.61259487233214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Learning (FL) is a machine learning setting where many devices
collaboratively train a machine learning model while keeping the training data
decentralized. In most of the current training schemes the central model is
refined by averaging the parameters of the server model and the updated
parameters from the client side. However, directly averaging model parameters
is only possible if all models have the same structure and size, which could be
a restrictive constraint in many scenarios.
In this work we investigate more powerful and more flexible aggregation
schemes for FL. Specifically, we propose ensemble distillation for model
fusion, i.e. training the central classifier through unlabeled data on the
outputs of the models from the clients. This knowledge distillation technique
mitigates privacy risk and cost to the same extent as the baseline FL
algorithms, but allows flexible aggregation over heterogeneous client models
that can differ e.g. in size, numerical precision or structure. We show in
extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100,
ImageNet, AG News, SST2) and settings (heterogeneous models/data) that the
server model can be trained much faster, requiring fewer communication rounds
than any existing FL technique so far.
Related papers
- Enhancing One-Shot Federated Learning Through Data and Ensemble
Co-Boosting [76.64235084279292]
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round.
We introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively.
arXiv Detail & Related papers (2024-02-23T03:15:10Z) - NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients [44.89061671579694]
Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance.
We propose nested federated learning (NeFL), a framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling.
NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches.
arXiv Detail & Related papers (2023-08-15T13:29:14Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model
Extraction [16.160943049655664]
FedRolex is a partial training approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model.
We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods.
arXiv Detail & Related papers (2022-12-03T06:04:11Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Multi-Model Federated Learning with Provable Guarantees [19.470024548995717]
Federated Learning (FL) is a variant of distributed learning where devices collaborate to learn a model without sharing their data with the central server or each other.
We refer to the process of multiple independent clients simultaneously in a federated setting using a common pool of clients as a multi-model edge FL.
arXiv Detail & Related papers (2022-07-09T19:47:52Z) - No One Left Behind: Inclusive Federated Learning over Heterogeneous
Devices [79.16481453598266]
We propose InclusiveFL, a client-inclusive federated learning method to handle this problem.
The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities.
We also propose an effective method to share the knowledge among multiple local models with different sizes.
arXiv Detail & Related papers (2022-02-16T13:03:27Z) - Federated learning with hierarchical clustering of local updates to
improve training on non-IID data [3.3517146652431378]
We show that learning a single joint model is often not optimal in the presence of certain types of non-iid data.
We present a modification to FL by introducing a hierarchical clustering step (FL+HC)
We show how FL+HC allows model training to converge in fewer communication rounds compared to FL without clustering.
arXiv Detail & Related papers (2020-04-24T15:16:01Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.