SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
- URL: http://arxiv.org/abs/2308.06522v1
- Date: Sat, 12 Aug 2023 10:33:57 GMT
- Title: SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models
- Authors: Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng
Liu, Kee-Bong Song, Mostafa El-Khamy, Salman Avestimehr
- Abstract summary: Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning.
We propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios.
Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning.
- Score: 28.764782216513037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning via fine-tuning pre-trained transformer models has gained
significant success in delivering state-of-the-art results across various NLP
tasks. In the absence of centralized data, Federated Learning (FL) can benefit
from distributed and private data of the FL edge clients for fine-tuning.
However, due to the limited communication, computation, and storage
capabilities of edge devices and the huge sizes of popular transformer models,
efficient fine-tuning is crucial to make federated training feasible. This work
explores the opportunities and challenges associated with applying parameter
efficient fine-tuning (PEFT) methods in different FL settings for language
tasks. Specifically, our investigation reveals that as the data across users
becomes more diverse, the gap between fully fine-tuning the model and employing
PEFT methods widens. To bridge this performance gap, we propose a method called
SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data
scenarios through a novel data-driven initialization technique. Our
experimental results demonstrate that SLoRA achieves performance comparable to
full fine-tuning, with significant sparse updates with approximately $\sim 1\%$
density while reducing training time by up to $90\%$.
Related papers
- Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent [15.463595798992621]
Large language models (LLMs) have revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks.
Existing solutions make the unrealistic assumption that the entire model is exchanged for training.
We introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption.
arXiv Detail & Related papers (2024-06-17T03:49:44Z) - FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition [7.229494183462913]
Despite exceptional performance after fine-tuning, pre-trained language models (PLMs) face significant challenges due to privacy concerns.
We consider federated learning (FL) to fine-tune PLMs in this paper.
One promising solution is to exploit parameter-efficient fine-tuning (PEFT) into FL, which trains a much smaller set of parameters than full parameter fine-tuning (FFT)
arXiv Detail & Related papers (2024-04-29T16:42:26Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - When Federated Learning Meets Pre-trained Language Models'
Parameter-Efficient Tuning Methods [22.16636947999123]
We introduce various parameter-efficient tuning (PETuning) methods into federated learning.
Specifically, we provide a holistic empirical study of representative PLMs tuning methods in FL.
Overall communication overhead can be significantly reduced by locally tuning and globally aggregating lightweight model parameters.
arXiv Detail & Related papers (2022-12-20T06:44:32Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Acceleration of Federated Learning with Alleviated Forgetting in Local
Training [61.231021417674235]
Federated learning (FL) enables distributed optimization of machine learning models while protecting privacy.
We propose FedReg, an algorithm to accelerate FL with alleviated knowledge forgetting in the local training stage.
Our experiments demonstrate that FedReg not only significantly improves the convergence rate of FL, especially when the neural network architecture is deep.
arXiv Detail & Related papers (2022-03-05T02:31:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.