Related papers: Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

URL: http://arxiv.org/abs/2406.11187v2
Date: Fri, 19 Jul 2024 13:22:02 GMT
Title: Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Authors: Lin Wang, Zhichao Wang, Xiaoying Tang,
Abstract summary: Large language models (LLMs) have revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. Existing solutions make the unrealistic assumption that the entire model is exchanged for training. We introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption.
Score: 15.463595798992621
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic assumption that the entire model is exchanged for training, or apply parameter-effective fine-tuning methods from centralized learning to train LLMs in FL which tend to underperform during training or fine-tuning stages due to the limited search subspace of parameter updating. In this paper, we introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption. Our approach, termed FedCyBGD, utilizes Cycle Block Gradient Descent to periodically update the model. In particular, we design a compression scheme for FedCyBGD, aiming to further decrease the model download cost. It enables full parameter training in FL with only selected block updates and uploads, thereby reducing communication, computation, and memory costs. Our method achieves state-of-the-art performance for FL LLM training, while significantly reducing associated costs. Codes are provided here.

Related papers

Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models [13.742950928229078]
Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models. This paper introduces a wireless federated LoRA fine-tuning framework that optimize both learning performance and communication efficiency.
arXiv Detail & Related papers (2025-05-01T06:15:38Z)
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing [48.30406812516552]
We introduce the PLM, a Peripheral Language Model, developed through a co-design process that jointly optimize model architecture and edge system constraints. PLM employs a Multi-head Latent Attention mechanism and employs the squared ReLU activation function to encourage sparsity, thereby reducing peak memory footprint. evaluation results demonstrate that PLM outperforms existing small language models trained on publicly available data.
arXiv Detail & Related papers (2025-03-15T15:11:17Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
CELLM: An Efficient Communication in Large Language Models Training for Federated Learning [0.0]
This thesis aims to develop efficient training methods for large language models (LLMs) in Federated Learning (FL) First, we use low-rank adaptation (LoRA) to reduce the computational load of local model training. Second, we communicate sparse updates throughout training to significantly cut down on communication costs.
arXiv Detail & Related papers (2024-07-30T05:24:08Z)
R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [83.77114091471822]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML) A challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming. This is particularly pronounced for word embedding parameters in large language models (LLMs), which are crucial for language understanding. A physical layer framework is developed for resilient SFL with LLMs (R-SFLLM) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks [19.007090250576585]
BlockLLM is an approach inspired by block coordinate descent. It achieves state-of-the-art performance in both finetuning and pretraining tasks.
arXiv Detail & Related papers (2024-06-25T05:45:12Z)
Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training [21.89214794178211]
In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space. We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training. Our empirical study shows that EmbracingFL consistently achieves high accuracy as like all clients are strong, outperforming the state-of-the-art width reduction methods.
arXiv Detail & Related papers (2024-06-21T13:19:29Z)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs [67.38165028487242]
We introduce Dynamic Sparse No Training (DSnoT), a training-free fine-tuning approach to fine-tune large language models (LLMs) Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs. Our paper offers fresh insights into how to fine-tune sparse LLMs in an efficient training-free manner and open new venues to scale the great potential of sparsity to LLMs.
arXiv Detail & Related papers (2023-10-13T07:38:52Z)
FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models [22.29061931122386]
Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges.
arXiv Detail & Related papers (2023-10-02T16:43:14Z)
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models [28.764782216513037]
Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. We propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning.
arXiv Detail & Related papers (2023-08-12T10:33:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.