FedMS: Federated Learning with Mixture of Sparsely Activated Foundations
Models
- URL: http://arxiv.org/abs/2312.15926v1
- Date: Tue, 26 Dec 2023 07:40:26 GMT
- Title: FedMS: Federated Learning with Mixture of Sparsely Activated Foundations
Models
- Authors: Panlong Wu, Kangshuo Li, Ting Wang, and Fangxin Wang
- Abstract summary: We propose a novel two-stage federated learning algorithm called FedMS.
A global expert is trained in the first stage and a local expert is trained in the second stage to provide better personalization.
We employ extensive experiments to verify the effectiveness of FedMS, results show that FedMS outperforms other SOTA baselines by up to 55.25% in default settings.
- Score: 11.362085734837217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models have shown great success in natural language processing,
computer vision, and multimodal tasks. FMs have a large number of model
parameters, thus requiring a substantial amount of data to help optimize the
model during the training. Federated learning has revolutionized machine
learning by enabling collaborative learning from decentralized data while still
preserving the data privacy of clients. Despite the great benefits foundation
models can have empowered by federated learning, they face severe computation,
communication, and statistical challenges. In this paper, we propose a novel
two-stage federated learning algorithm called FedMS. A global expert is trained
in the first stage and a local expert is trained in the second stage to provide
better personalization. We construct a Mixture of Foundation Models (MoFM) with
these two experts and design a gate neural network with an inserted gate
adapter that joins the aggregation every communication round in the second
stage. To further adapt to edge computing scenarios with limited computational
resources, we design a novel Sparsely Activated LoRA (SAL) algorithm that
freezes the pre-trained foundation model parameters inserts low-rank adaptation
matrices into transformer blocks and activates them progressively during the
training. We employ extensive experiments to verify the effectiveness of FedMS,
results show that FedMS outperforms other SOTA baselines by up to 55.25% in
default settings.
Related papers
- Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [41.79433449873368]
We propose a novel multi-modal federated learning method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP)
FedMVP integrates the large-scale pre-trained models to enhance the federated training.
We demonstrate that the model achieves superior performance over two real-world image-text classification datasets.
arXiv Detail & Related papers (2024-06-16T19:18:06Z) - Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice.
In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness.
We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - A Survey on Efficient Federated Learning Methods for Foundation Model Training [62.473245910234304]
Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients.
In the wake of Foundation Models (FM), the reality is different for many deep learning applications.
We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications.
arXiv Detail & Related papers (2024-01-09T10:22:23Z) - Semi-Federated Learning: Convergence Analysis and Optimization of A
Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL.
We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Supernet Training for Federated Image Classification under System
Heterogeneity [15.2292571922932]
In this work, we propose a novel framework to consider both scenarios, namely Federation of Supernet Training (FedSup)
It is inspired by how averaging parameters in the model aggregation stage of Federated Learning (FL) is similar to weight-sharing in supernet training.
Under our framework, we present an efficient algorithm (E-FedSup) by sending the sub-model to clients in the broadcast stage for reducing communication costs and training overhead.
arXiv Detail & Related papers (2022-06-03T02:21:01Z) - An Expectation-Maximization Perspective on Federated Learning [75.67515842938299]
Federated learning describes the distributed training of models across multiple clients while keeping the data private on-device.
In this work, we view the server-orchestrated federated learning process as a hierarchical latent variable model where the server provides the parameters of a prior distribution over the client-specific model parameters.
We show that with simple Gaussian priors and a hard version of the well known Expectation-Maximization (EM) algorithm, learning in such a model corresponds to FedAvg, the most popular algorithm for the federated learning setting.
arXiv Detail & Related papers (2021-11-19T12:58:59Z) - FedAUX: Leveraging Unlabeled Auxiliary Data in Federated Learning [14.10627556244287]
Federated Distillation (FD) is a popular novel algorithmic paradigm for Federated Learning.
We propose FedAUX, which drastically improves performance by deriving maximum utility from the unlabeled auxiliary data.
Experiments on large-scale convolutional neural networks and transformer models demonstrate, that the training performance of FedAUX exceeds SOTA FL baseline methods.
arXiv Detail & Related papers (2021-02-04T09:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.