Related papers: Distilled One-Shot Federated Learning

Distilled One-Shot Federated Learning

URL: http://arxiv.org/abs/2009.07999v3
Date: Sun, 6 Jun 2021 06:55:46 GMT
Title: Distilled One-Shot Federated Learning
Authors: Yanlin Zhou, George Pu, Xiyao Ma, Xiaolin Li, Dapeng Wu
Abstract summary: We propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg.
Score: 13.294757670979031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current federated learning algorithms take tens of communication rounds transmitting unwieldy model weights under ideal circumstances and hundreds when data is poorly distributed. Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving comparable performance. In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model. The distilled data look like noise and are only useful to the specific model weights, i.e., become useless after the model updates. With this weight-less and gradient-less design, the total communication cost of DOSFL is up to three orders of magnitude less than FedAvg while preserving between 93% to 99% performance of a centralized counterpart. Afterwards, clients could switch to traditional methods such as FedAvg to finetune the last few percent to fit personalized local models with local datasets. Through comprehensive experiments, we show the accuracy and communication performance of DOSFL on both vision and language tasks with different models including CNN, LSTM, Transformer, etc. We demonstrate that an eavesdropping attacker cannot properly train a good model using the leaked distilled data, without knowing the initial model weights. DOSFL serves as an inexpensive method to quickly converge on a performant pre-trained model with less than 0.1% communication cost of traditional methods.

Related papers

Approximating Language Model Training Data from Weights [70.08614275061689]
We formalize the problem of data approximation from model weights and propose several baselines and metrics.<n>We develop a gradient-based approach that selects the highest-matching data from a large public text corpus.<n>Even when none of the true training data is known, our method is able to locate a small subset of public Web documents.
arXiv Detail & Related papers (2025-06-18T15:26:43Z)
One-Shot Federated Learning with Classifier-Free Diffusion Models [7.338353383261602]
One-shot federated learning (OSFL) addresses this by forming a global model with a single communication round. OSCAR is a simple yet cost-effective OSFL approach that outperforms the state-of-the-art on four datasets while reducing the communication load by at least 99%.
arXiv Detail & Related papers (2025-02-12T15:23:29Z)
One-shot Federated Learning via Synthetic Distiller-Distillate Communication [63.89557765137003]
One-shot Federated learning (FL) is a powerful technology facilitating collaborative training of machine learning models in a single round of communication. We propose FedSD2C, a novel and practical one-shot FL framework designed to address these challenges.
arXiv Detail & Related papers (2024-12-06T17:05:34Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Federated Learning on Non-iid Data via Local and Global Distillation [25.397058380098816]
We propose FedND: federated learning with noise distillation. In the client, we propose a self-distillation method to train the local model. In the server, we generate noisy samples for each client and use them to distill other clients. Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.
arXiv Detail & Related papers (2023-06-26T06:14:01Z)
SalientGrads: Sparse Models for Communication Efficient and Data Aware Distributed Federated Training [1.0413504599164103]
Federated learning (FL) enables the training of a model leveraging decentralized data in client sites while preserving privacy by not collecting data. One of the significant challenges of FL is limited computation and low communication bandwidth in resource limited edge client nodes. We propose Salient Grads, which simplifies the process of sparse training by choosing a data aware subnetwork before training.
arXiv Detail & Related papers (2023-04-15T06:46:37Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry. We propose FedDM to build the global training objective from multiple local surrogate functions. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z)
One-shot Federated Learning without Server-side Training [42.59845771101823]
One-shot federated learning is gaining popularity as a way to reduce communication cost between clients and the server. Most of the existing one-shot FL methods are based on Knowledge Distillation; however, distillation based approach requires an extra training phase and depends on publicly available data sets or generated pseudo samples. In this work, we consider a novel and challenging cross-silo setting: performing a single round of parameter aggregation on the local models without server-side training.
arXiv Detail & Related papers (2022-04-26T01:45:37Z)
FedSynth: Gradient Compression via Synthetic Data in Federated Learning [14.87215762562876]
We propose a new scheme for upstream communication where instead of transmitting the model update, each client learns and transmits a light-weight synthetic dataset. We find our method is comparable/better than random masking baselines in all three common federated learning benchmark datasets.
arXiv Detail & Related papers (2022-04-04T06:47:20Z)
ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning. ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z)
FedKD: Communication Efficient Federated Learning via Knowledge Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data. In federated learning, clients need to communicate their local model updates in each iteration of model learning. We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z)
Ensemble Distillation for Robust Model Fusion in Federated Learning [72.61259487233214]
Federated Learning (FL) is a machine learning setting where many devices collaboratively train a machine learning model. In most of the current training schemes the central model is refined by averaging the parameters of the server model and the updated parameters from the client side. We propose ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients.
arXiv Detail & Related papers (2020-06-12T14:49:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.