Related papers: Ensemble Distillation for Robust Model Fusion in Federated Learning

Ensemble Distillation for Robust Model Fusion in Federated Learning

URL: http://arxiv.org/abs/2006.07242v3
Date: Sat, 27 Mar 2021 16:31:56 GMT
Title: Ensemble Distillation for Robust Model Fusion in Federated Learning
Authors: Tao Lin, Lingjing Kong, Sebastian U. Stich, Martin Jaggi
Abstract summary: Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
Score: 72.61259487233214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model while keeping the training data decentralized. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. However, directly averaging model parameters is only possible if all models have the same structure and size, which could be a restrictive constraint in many scenarios. In this work we investigate more powerful and more flexible aggregation schemes for FL. Specifically, we propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients. This knowledge distillation technique mitigates privacy risk and cost to the same extent as the baseline FL algorithms, but allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure. We show in extensive empirical experiments on various CV/NLP datasets (CIFAR-10/100, ImageNet, AG News, SST2) and settings (heterogeneous models/data) that the server model can be trained much faster, requiring fewer communication rounds than any existing FL technique so far.

Related papers

Data-Free Black-Box Federated Learning via Zeroth-Order Gradient Estimation [5.342190657553561]
Federated learning (FL) enables decentralized clients to collaboratively train a global model under the orchestration of a central server. We propose a data-free and black-box FL framework via Zeroth-order Gradient Estimation (FedZGE)
arXiv Detail & Related papers (2025-03-08T03:00:01Z)
Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting [76.64235084279292]
One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. We introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively.
arXiv Detail & Related papers (2024-02-23T03:15:10Z)
NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients [44.89061671579694]
Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance. We propose nested federated learning (NeFL), a framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling. NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches.
arXiv Detail & Related papers (2023-08-15T13:29:14Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction [16.160943049655664]
FedRolex is a partial training approach that enables model-heterogeneous FL and can train a global server model larger than the largest client model. We show that FedRolex outperforms state-of-the-art PT-based model-heterogeneous FL methods.
arXiv Detail & Related papers (2022-12-03T06:04:11Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
Multi-Model Federated Learning with Provable Guarantees [19.470024548995717]
Federated Learning (FL) is a variant of distributed learning where devices collaborate to learn a model without sharing their data with the central server or each other. We refer to the process of multiple independent clients simultaneously in a federated setting using a common pool of clients as a multi-model edge FL.
arXiv Detail & Related papers (2022-07-09T19:47:52Z)
No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices [79.16481453598266]
We propose InclusiveFL, a client-inclusive federated learning method to handle this problem. The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities. We also propose an effective method to share the knowledge among multiple local models with different sizes.
arXiv Detail & Related papers (2022-02-16T13:03:27Z)
Federated learning with hierarchical clustering of local updates to improve training on non-IID data [3.3517146652431378]
We show that learning a single joint model is often not optimal in the presence of certain types of non-iid data. We present a modification to FL by introducing a hierarchical clustering step (FL+HC) We show how FL+HC allows model training to converge in fewer communication rounds compared to FL without clustering.
arXiv Detail & Related papers (2020-04-24T15:16:01Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.