Related papers: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models

URL: http://arxiv.org/abs/2409.06277v2
Date: Wed, 11 Sep 2024 01:47:48 GMT
Title: Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models
Authors: Yao Shu, Wenyang Hu, See-Kiong Ng, Bryan Kian Hsiang Low, Fei Richard Yu,
Abstract summary: Large Language Models (LLMs) have become indispensable in numerous real-world applications. Ferret is the first first-order method with shared randomness. It achieves high computational efficiency, reduced communication overhead, and fast convergence.
Score: 54.02863371927658
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have become indispensable in numerous real-world applications. Unfortunately, fine-tuning these models at scale, especially in federated settings where data privacy and communication efficiency are critical, presents significant challenges. Existing methods often resort to parameter-efficient fine-tuning (PEFT) to mitigate communication overhead, but this typically comes at the cost of model accuracy. To address these limitations, we propose federated full-parameter tuning at scale for LLMs (Ferret), the first first-order method with shared randomness to enable scalable full-parameter tuning of LLMs across decentralized data sources while maintaining competitive model accuracy. Ferret accomplishes this through three aspects: (1) it employs widely applied first-order methods for efficient local updates; (2) it projects these updates into a low-dimensional space to considerably reduce communication overhead; and (3) it reconstructs local updates from this low-dimensional space with shared randomness to facilitate effective full-parameter global aggregation, ensuring fast convergence and competitive final performance. Our rigorous theoretical analyses and insights along with extensive experiments, show that Ferret significantly enhances the scalability of existing federated full-parameter tuning approaches by achieving high computational efficiency, reduced communication overhead, and fast convergence, all while maintaining competitive model accuracy. Our implementation is available at https://github.com/allen4747/Ferret.

Related papers

GradualDiff-Fed: A Federated Learning Specialized Framework for Large Language Model [0.0]
We introduce GradualDiff-Fed, an FL framework designed explicitly for large language models (LLMs)<n> GradualDiff-Fed reduces communication costs by transmitting only the difference of model weights rather than the entire model during training rounds.<n>Our evaluation demonstrates that GradualDiff-Fed achieves performance on par with centralized training while drastically reducing communication overhead.
arXiv Detail & Related papers (2025-06-23T22:03:21Z)
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models [13.742950928229078]
Low-Rank Adaptation (LoRA) addresses these issues by training compact, low-rank matrices instead of fully fine-tuning large models. This paper introduces a wireless federated LoRA fine-tuning framework that optimize both learning performance and communication efficiency.
arXiv Detail & Related papers (2025-05-01T06:15:38Z)
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining [56.58170370127227]
We show that optimal learning rate follows a power-law relationship with both model parameters and data sizes, while optimal batch size scales primarily with data sizes. This work is the first work that unifies different model shapes and structures, such as Mixture-of-Experts models and dense transformers.
arXiv Detail & Related papers (2025-03-06T18:58:29Z)
Fisher Information-based Efficient Curriculum Federated Learning with Large Language Models [43.26028399395612]
We propose a Fisher Information-based Efficient Curriculum Federated Learning framework (FibecFed) with two novel methods. First, we propose a fisher information-based method to adaptively sample data within each device to improve the effectiveness of the FL fine-tuning process. Second, we dynamically select the proper layers for global aggregation and sparse parameters for local update with LoRA.
arXiv Detail & Related papers (2024-09-30T18:12:18Z)
FedLog: Personalized Federated Classification with Less Communication and More Flexibility [24.030147353437382]
Federated representation learning (FRL) aims to learn personalized federated models with effective feature extraction from local data. To reduce the overhead, we propose to share sufficient data summaries instead of raw model parameters.
arXiv Detail & Related papers (2024-07-11T09:40:29Z)
SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z)
SA-FedLora: Adaptive Parameter Allocation for Efficient Federated Learning with LoRA Tuning [6.125512669585788]
We propose a Simulated Annealing-based Federated Learning with LoRA tuning (SA-FedLoRA) approach by reducing trainable parameters. Experimental results demonstrate that SA-FedLoRA is an efficient FL, achieving superior performance to FedAvg and significantly reducing communication parameters by up to 93.62%.
arXiv Detail & Related papers (2024-05-15T14:50:46Z)
Parametric Feature Transfer: One-shot Federated Learning with Foundation Models [14.97955440815159]
In one-shot federated learning, clients collaboratively train a global model in a single round of communication. This paper introduces FedPFT, a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL.
arXiv Detail & Related papers (2024-02-02T19:34:46Z)
Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data. One key challenge in federated learning is to handle non-identically distributed data across the clients. We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes [53.4856038354195]
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. FedKSeed employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds.
arXiv Detail & Related papers (2023-12-11T13:03:21Z)
Federated Learning of Large Language Models with Parameter-Efficient Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data. The training process of Large Language Models (LLMs) generally incurs the update of significant parameters. This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z)
SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models [28.764782216513037]
Federated Learning (FL) can benefit from distributed and private data of the FL edge clients for fine-tuning. We propose a method called SLoRA, which overcomes the key limitations of LoRA in high heterogeneous data scenarios. Our experimental results demonstrate that SLoRA achieves performance comparable to full fine-tuning.
arXiv Detail & Related papers (2023-08-12T10:33:57Z)
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning [126.84770886628833]
Existing finetuning methods either tune all parameters of the pretrained model (full finetuning) or only tune the last linear layer (linear probing) We propose a new parameter-efficient finetuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance full finetuning.
arXiv Detail & Related papers (2022-10-17T08:14:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.