Distilled One-Shot Federated Learning
- URL: http://arxiv.org/abs/2009.07999v3
- Date: Sun, 6 Jun 2021 06:55:46 GMT
- Title: Distilled One-Shot Federated Learning
- Authors: Yanlin Zhou, George Pu, Xiyao Ma, Xiaolin Li, Dapeng Wu
- Abstract summary: We propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance.
In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model.
With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg.
- Score: 13.294757670979031
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current federated learning algorithms take tens of communication rounds
transmitting unwieldy model weights under ideal circumstances and hundreds when
data is poorly distributed. Inspired by recent work on dataset distillation and
distributed one-shot learning, we propose Distilled One-Shot Federated Learning
(DOSFL) to significantly reduce the communication cost while achieving
comparable performance. In just one round, each client distills their private
dataset, sends the synthetic data (e.g. images or sentences) to the server, and
collectively trains a global model. The distilled data look like noise and are
only useful to the specific model weights, i.e., become useless after the model
updates. With this weight-less and gradient-less design, the total
communication cost of DOSFL is up to three orders of magnitude less than FedAvg
while preserving between 93% to 99% performance of a centralized counterpart.
Afterwards, clients could switch to traditional methods such as FedAvg to
finetune the last few percent to fit personalized local models with local
datasets. Through comprehensive experiments, we show the accuracy and
communication performance of DOSFL on both vision and language tasks with
different models including CNN, LSTM, Transformer, etc. We demonstrate that an
eavesdropping attacker cannot properly train a good model using the leaked
distilled data, without knowing the initial model weights. DOSFL serves as an
inexpensive method to quickly converge on a performant pre-trained model with
less than 0.1% communication cost of traditional methods.
Related papers
- Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Federated Learning on Non-iid Data via Local and Global Distillation [25.397058380098816]
We propose FedND: federated learning with noise distillation.
In the client, we propose a self-distillation method to train the local model.
In the server, we generate noisy samples for each client and use them to distill other clients.
Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.
arXiv Detail & Related papers (2023-06-26T06:14:01Z) - SalientGrads: Sparse Models for Communication Efficient and Data Aware
Distributed Federated Training [1.0413504599164103]
Federated learning (FL) enables the training of a model leveraging decentralized data in client sites while preserving privacy by not collecting data.
One of the significant challenges of FL is limited computation and low communication bandwidth in resource limited edge client nodes.
We propose Salient Grads, which simplifies the process of sparse training by choosing a data aware subnetwork before training.
arXiv Detail & Related papers (2023-04-15T06:46:37Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - One-shot Federated Learning without Server-side Training [42.59845771101823]
One-shot federated learning is gaining popularity as a way to reduce communication cost between clients and the server.
Most of the existing one-shot FL methods are based on Knowledge Distillation; however, distillation based approach requires an extra training phase and depends on publicly available data sets or generated pseudo samples.
In this work, we consider a novel and challenging cross-silo setting: performing a single round of parameter aggregation on the local models without server-side training.
arXiv Detail & Related papers (2022-04-26T01:45:37Z) - FedSynth: Gradient Compression via Synthetic Data in Federated Learning [14.87215762562876]
We propose a new scheme for upstream communication where instead of transmitting the model update, each client learns and transmits a light-weight synthetic dataset.
We find our method is comparable/better than random masking baselines in all three common federated learning benchmark datasets.
arXiv Detail & Related papers (2022-04-04T06:47:20Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - FedKD: Communication Efficient Federated Learning via Knowledge
Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data.
In federated learning, clients need to communicate their local model updates in each iteration of model learning.
We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z) - Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model.
In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side.
We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.