FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale
Neural Networks through Federated Learning
- URL: http://arxiv.org/abs/2208.05174v5
- Date: Mon, 7 Aug 2023 14:37:00 GMT
- Title: FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale
Neural Networks through Federated Learning
- Authors: Yuanyuan Chen, Zichen Chen, Pengcheng Wu, Han Yu
- Abstract summary: We propose the Federated Opportunistic Block Dropout (FedOBD) approach to train large-scale neural networks.
FedOBD decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks.
Experiments show that FedOBD reduces the overall communication overhead by more than 88% compared to the best performing baseline approach.
- Score: 18.357577491590686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale neural networks possess considerable expressive power. They are
well-suited for complex learning tasks in industrial applications. However,
large-scale models pose significant challenges for training under the current
Federated Learning (FL) paradigm. Existing approaches for efficient FL training
often leverage model parameter dropout. However, manipulating individual model
parameters is not only inefficient in meaningfully reducing the communication
overhead when training large-scale FL models, but may also be detrimental to
the scaling efforts and model performance as shown by recent research. To
address these issues, we propose the Federated Opportunistic Block Dropout
(FedOBD) approach. The key novelty is that it decomposes large-scale models
into semantic blocks so that FL participants can opportunistically upload
quantized blocks, which are deemed to be significant towards training the
model, to the FL server for aggregation. Extensive experiments evaluating
FedOBD against four state-of-the-art approaches based on multiple real-world
datasets show that it reduces the overall communication overhead by more than
88% compared to the best performing baseline approach, while achieving the
highest test accuracy. To the best of our knowledge, FedOBD is the first
approach to perform dropout on FL models at the block level rather than at the
individual parameter level.
Related papers
- A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - Straggler-resilient Federated Learning: Tackling Computation
Heterogeneity with Layer-wise Partial Model Training in Mobile Edge Network [4.1813760301635705]
We propose Federated Partial Model Training (FedPMT), where devices with smaller computational capabilities work on partial models and contribute to the global model.
As such, all devices in FedPMT prioritize the most crucial parts of the global model.
Empirical results show that FedPMT significantly outperforms the existing benchmark FedDrop.
arXiv Detail & Related papers (2023-11-16T16:30:04Z) - Tunable Soft Prompts are Messengers in Federated Learning [55.924749085481544]
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources.
The lack of model privacy protection in FL becomes an unneglectable challenge.
We propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts.
arXiv Detail & Related papers (2023-11-12T11:01:10Z) - NeFL: Nested Model Scaling for Federated Learning with System Heterogeneous Clients [44.89061671579694]
Federated learning (FL) enables distributed training while preserving data privacy, but stragglers-slow or incapable clients-can significantly slow down the total training time and degrade performance.
We propose nested federated learning (NeFL), a framework that efficiently divides deep neural networks into submodels using both depthwise and widthwise scaling.
NeFL achieves performance gain, especially for the worst-case submodel compared to baseline approaches.
arXiv Detail & Related papers (2023-08-15T13:29:14Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - Conquering the Communication Constraints to Enable Large Pre-Trained Models in Federated Learning [18.12162136918301]
Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices.
Recent state-of-the-art pre-trained models are getting more capable but also have more parameters.
Can we find a solution to enable those strong and readily-available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden?
Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings.
arXiv Detail & Related papers (2022-10-04T16:08:54Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z) - No One Left Behind: Inclusive Federated Learning over Heterogeneous
Devices [79.16481453598266]
We propose InclusiveFL, a client-inclusive federated learning method to handle this problem.
The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities.
We also propose an effective method to share the knowledge among multiple local models with different sizes.
arXiv Detail & Related papers (2022-02-16T13:03:27Z) - Towards Understanding Quality Challenges of the Federated Learning: A
First Look from the Lens of Robustness [4.822471415125479]
Federated learning (FL) aims to preserve users' data privacy while leveraging the entire dataset of all participants for training.
FL still tends to suffer from quality issues such as attacks or byzantine faults.
This paper investigates the effectiveness of state-of-the-art (SOTA) robust FL techniques in the presence of attacks and faults.
arXiv Detail & Related papers (2022-01-05T02:06:39Z) - UVeQFed: Universal Vector Quantization for Federated Learning [179.06583469293386]
Federated learning (FL) is an emerging approach to train such learning models without requiring the users to share their possibly private labeled data.
In FL, each user trains its copy of the learning model locally. The server then collects the individual updates and aggregates them into a global model.
We show that combining universal vector quantization methods with FL yields a decentralized training system in which the compression of the trained models induces only a minimum distortion.
arXiv Detail & Related papers (2020-06-05T07:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.