Communication-Efficient and Tensorized Federated Fine-Tuning of Large Language Models
- URL: http://arxiv.org/abs/2410.13097v1
- Date: Wed, 16 Oct 2024 23:50:39 GMT
- Title: Communication-Efficient and Tensorized Federated Fine-Tuning of Large Language Models
- Authors: Sajjad Ghiasvand, Yifan Yang, Zhiyu Xue, Mahnoosh Alizadeh, Zheng Zhang, Ramtin Pedarsani,
- Abstract summary: We introduce FedTT and FedTT+, methods for adapting Large Language Models.
FedTT is versatile and can be applied to both cross-silo FL and large-scale cross-device FL.
Our proposed methods successfully address data heterogeneity challenges and perform on par or even better than existing federated PEFT approaches.
- Score: 24.07770417615704
- License:
- Abstract: Parameter-efficient fine-tuning (PEFT) methods typically assume that Large Language Models (LLMs) are trained on data from a single device or client. However, real-world scenarios often require fine-tuning these models on private data distributed across multiple devices. Federated Learning (FL) offers an appealing solution by preserving user privacy, as sensitive data remains on local devices during training. Nonetheless, integrating PEFT methods into FL introduces two main challenges: communication overhead and data heterogeneity. In this paper, we introduce FedTT and FedTT+, methods for adapting LLMs by integrating tensorized adapters into client-side models' encoder/decoder blocks. FedTT is versatile and can be applied to both cross-silo FL and large-scale cross-device FL. FedTT+, an extension of FedTT tailored for cross-silo FL, enhances robustness against data heterogeneity by adaptively freezing portions of tensor factors, further reducing the number of trainable parameters. Experiments on BERT and LLaMA models demonstrate that our proposed methods successfully address data heterogeneity challenges and perform on par or even better than existing federated PEFT approaches while achieving up to 10$\times$ reduction in communication cost.
Related papers
- Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models [54.02863371927658]
Large Language Models (LLMs) have become indispensable in numerous real-world applications.
Ferret is the first first-order method with shared randomness.
It achieves high computational efficiency, reduced communication overhead, and fast convergence.
arXiv Detail & Related papers (2024-09-10T07:28:13Z) - FedFT: Improving Communication Performance for Federated Learning with Frequency Space Transformation [0.361593752383807]
We introduce FedFT (federated frequency-space transformation), a simple yet effective methodology for communicating model parameters in a Federated Learning setting.
FedFT uses Discrete Cosine Transform (DCT) to represent model parameters in frequency space, enabling efficient compression and reducing communication overhead.
We demonstrate the generalisability of the FedFT methodology on four datasets using comparative studies with three state-of-the-art FL baselines.
arXiv Detail & Related papers (2024-09-08T23:05:35Z) - Parametric Feature Transfer: One-shot Federated Learning with Foundation
Models [14.97955440815159]
In one-shot federated learning, clients collaboratively train a global model in a single round of communication.
This paper introduces FedPFT, a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL.
arXiv Detail & Related papers (2024-02-02T19:34:46Z) - Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions.
FedKSeed employs zeroth-order optimization with a finite set of random seeds.
It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z) - Tunable Soft Prompts are Messengers in Federated Learning [55.924749085481544]
Federated learning (FL) enables multiple participants to collaboratively train machine learning models using decentralized data sources.
The lack of model privacy protection in FL becomes an unneglectable challenge.
We propose a novel FL training approach that accomplishes information exchange among participants via tunable soft prompts.
arXiv Detail & Related papers (2023-11-12T11:01:10Z) - FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language
Models [22.29061931122386]
Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks.
This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges.
arXiv Detail & Related papers (2023-10-02T16:43:14Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models [28.764782216513037]
Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning.
We propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios.
Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning.
arXiv Detail & Related papers (2023-08-12T10:33:57Z) - Federated Nearest Neighbor Machine Translation [66.8765098651988]
In this paper, we propose a novel federated nearest neighbor (FedNN) machine translation framework.
FedNN leverages one-round memorization-based interaction to share knowledge across different clients.
Experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg.
arXiv Detail & Related papers (2023-02-23T18:04:07Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.