Memory-adaptive Depth-wise Heterogenous Federated Learning
- URL: http://arxiv.org/abs/2303.04887v2
- Date: Wed, 10 Jan 2024 18:03:01 GMT
- Title: Memory-adaptive Depth-wise Heterogenous Federated Learning
- Authors: Kai Zhang, Yutong Dai, Hongyi Wang, Eric Xing, Xun Chen, Lichao Sun
- Abstract summary: We introduce a memory-adaptive depth-wise learning solution in FL called FeDepth, which adaptively decomposes the full model into blocks according to the memory budgets of each client.
Our method outperforms state-of-the-art approaches, achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and CIFAR-100, respectively.
- Score: 24.13198329419849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning is a promising paradigm that allows multiple clients to
collaboratively train a model without sharing the local data. However, the
presence of heterogeneous devices in federated learning, such as mobile phones
and IoT devices with varying memory capabilities, would limit the scale and
hence the performance of the model could be trained. The mainstream approaches
to address memory limitations focus on width-slimming techniques, where
different clients train subnetworks with reduced widths locally and then the
server aggregates the subnetworks. The global model produced from these methods
suffers from performance degradation due to the negative impact of the actions
taken to handle the varying subnetwork widths in the aggregation phase. In this
paper, we introduce a memory-adaptive depth-wise learning solution in FL called
FeDepth, which adaptively decomposes the full model into blocks according to
the memory budgets of each client and trains blocks sequentially to obtain a
full inference model. Our method outperforms state-of-the-art approaches,
achieving 5% and more than 10% improvements in top-1 accuracy on CIFAR-10 and
CIFAR-100, respectively. We also demonstrate the effectiveness of depth-wise
fine-tuning on ViT. Our findings highlight the importance of memory-aware
techniques for federated learning with heterogeneous devices and the success of
depth-wise training strategy in improving the global model's performance.
Related papers
- Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training [21.89214794178211]
In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space.
We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training.
Our empirical study shows that EmbracingFL consistently achieves high accuracy as like all clients are strong, outperforming the state-of-the-art width reduction methods.
arXiv Detail & Related papers (2024-06-21T13:19:29Z) - Federated Learning with Flexible Architectures [12.800116749927266]
This paper introduces Federated Learning with Flexible Architectures (FedFA), an FL training algorithm that allows clients to train models of different widths and depths.
FedFA incorporates the layer grafting technique to align clients' local architectures with the largest network architecture in the FL system during model aggregation.
arXiv Detail & Related papers (2024-06-14T09:44:46Z) - Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice.
In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness.
We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Submodel Partitioning in Hierarchical Federated Learning: Algorithm
Design and Convergence Analysis [15.311309249848739]
Hierarchical learning (FL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL)
In this paper, we propose independent sub training overconstrained Internet of Things (IoT)
Key idea behind HIST is a global version of model computation, where we partition the global model into disjoint submodels in each round, and distribute them across different cells.
arXiv Detail & Related papers (2023-10-27T04:42:59Z) - FedYolo: Augmenting Federated Learning with Pretrained Transformers [61.56476056444933]
In this work, we investigate pretrained transformers (PTF) to achieve on-device learning goals.
We show that larger scale shrinks the accuracy gaps between alternative approaches and improves robustness.
Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF.
arXiv Detail & Related papers (2023-07-10T21:08:52Z) - Adaptive Parameterization of Deep Learning Models for Federated Learning [85.82002651944254]
Federated Learning offers a way to train deep neural networks in a distributed fashion.
It incurs a communication overhead as the model parameters or gradients need to be exchanged regularly during training.
In this paper, we propose to utilise parallel Adapters for Federated Learning.
arXiv Detail & Related papers (2023-02-06T17:30:33Z) - No One Left Behind: Inclusive Federated Learning over Heterogeneous
Devices [79.16481453598266]
We propose InclusiveFL, a client-inclusive federated learning method to handle this problem.
The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities.
We also propose an effective method to share the knowledge among multiple local models with different sizes.
arXiv Detail & Related papers (2022-02-16T13:03:27Z) - Quasi-Global Momentum: Accelerating Decentralized Deep Learning on
Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks.
In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge.
We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z) - Accelerating Federated Learning over Reliability-Agnostic Clients in
Mobile Edge Computing Systems [15.923599062148135]
Federated learning has emerged as a promising privacy-preserving approach to facilitating AI applications.
It remains a big challenge to optimize the efficiency and effectiveness of FL when it is integrated with the MEC architecture.
In this paper, a multi-layer federated learning protocol called HybridFL is designed for the MEC architecture.
arXiv Detail & Related papers (2020-07-28T17:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.