Revisiting Federated Fine-Tuning: A Single Communication Round is Enough for Foundation Models
- URL: http://arxiv.org/abs/2412.04650v2
- Date: Thu, 06 Nov 2025 16:57:30 GMT
- Title: Revisiting Federated Fine-Tuning: A Single Communication Round is Enough for Foundation Models
- Authors: Ziyao Wang, Bowei Tian, Yexiao He, Zheyu Shen, Guoheng Sun, Yuhan Liu, Luyang Liu, Meng Liu, Ang Li,
- Abstract summary: We show that a single round of aggregation yields a global model performance comparable to that achieved through multiple rounds of aggregation.<n>Our experiments show that one-shot federated fine-tuning significantly reduces communication costs.<n>It also has the potential to enable asynchronous aggregation, enhances privacy, and maintains performance consistency with multi-round federated fine-tuning.
- Score: 34.57875427501524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advancement of foundation models (FMs) has increased the demand for fine-tuning these models on large-scale cross-domain datasets. To address this, federated fine-tuning has emerged, allowing FMs to be fine-tuned on distributed datasets across multiple devices while ensuring data privacy. However, the substantial parameter size and the multi-round communication in federated learning algorithms result in prohibitively high communication costs, challenging the practicality of federated fine-tuning. In this paper, we identify and analyze, both theoretically and empirically, that the traditional multi-round aggregation algorithms may not be necessary for federated fine-tuning large FMs. Our experiments reveal that a single round of aggregation (i.e., one-shot federated fine-tuning) yields a global model performance comparable to that achieved through multiple rounds of aggregation. Through rigorous mathematical and empirical analyses, we demonstrate that large FMs, due to their extensive parameter sizes and pre-training on general tasks, achieve significantly lower training loss in one-shot federated fine-tuning compared to smaller models. Our extensive experiments show that one-shot federated fine-tuning significantly reduces communication costs. It also has the potential to enable asynchronous aggregation, enhances privacy, and maintains performance consistency with multi-round federated fine-tuning on both text generation and text-to-image generation tasks. Our findings provide insights to revolutionize federated fine-tuning in practice, enhancing efficiency, reducing costs, and expanding accessibility for FMs.
Related papers
- Diversity Over Quantity: A Lesson From Few Shot Relation Classification [62.66895901654023]
We show that training on a diverse set of relations significantly enhances a model's ability to generalize to unseen relations.<n>We introduce REBEL-FS, a new FSRC benchmark that incorporates an order of magnitude more relation types than existing datasets.
arXiv Detail & Related papers (2024-12-06T21:41:01Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models [54.02863371927658]
Large Language Models (LLMs) have become indispensable in numerous real-world applications.
Ferret is the first first-order method with shared randomness.
It achieves high computational efficiency, reduced communication overhead, and fast convergence.
arXiv Detail & Related papers (2024-09-10T07:28:13Z) - Robust and Communication-Efficient Federated Domain Adaptation via Random Features [9.561648314302232]
federated domain adaptation (FDA) emerges as a powerful approach to address this challenge.
RF-TCA is an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance.
We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
arXiv Detail & Related papers (2023-11-08T13:46:58Z) - FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication [11.254610576923204]
We propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS)
Key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead.
Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.
arXiv Detail & Related papers (2023-10-10T22:23:27Z) - FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal
Heterogeneous Federated Learning [37.96957782129352]
We propose a finetuning framework tailored to heterogeneous multi-modal foundation models, called Federated Dual-Aadapter Teacher (Fed DAT)
Fed DAT addresses data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer.
To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity.
arXiv Detail & Related papers (2023-08-21T21:57:01Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z) - Conquering the Communication Constraints to Enable Large Pre-Trained Models in Federated Learning [18.12162136918301]
Federated learning (FL) has emerged as a promising paradigm for enabling the collaborative training of models without centralized access to the raw data on local devices.
Recent state-of-the-art pre-trained models are getting more capable but also have more parameters.
Can we find a solution to enable those strong and readily-available pre-trained models in FL to achieve excellent performance while simultaneously reducing the communication burden?
Specifically, we systemically evaluate the performance of FedPEFT across a variety of client stability, data distribution, and differential privacy settings.
arXiv Detail & Related papers (2022-10-04T16:08:54Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - Boosting Factorization Machines via Saliency-Guided Mixup [125.15872106335692]
We present MixFM, inspired by Mixup, to generate auxiliary training data to boost Factorization machines (FMs)
We also put forward a novel Factorization Machine powered by Saliency-guided Mixup (denoted as SMFM)
arXiv Detail & Related papers (2022-06-17T09:49:00Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z) - Communication-Efficient Federated Learning with Compensated
Overlap-FedAvg [22.636184975591004]
Federated learning is proposed to perform model training by multiple clients' combined data without the dataset sharing within the cluster.
We propose Overlap-FedAvg, a framework that parallels the model training phase with model uploading & downloading phase.
Overlap-FedAvg is further developed with a hierarchical computing strategy, a data compensation mechanism and a nesterov accelerated gradients(NAG) algorithm.
arXiv Detail & Related papers (2020-12-12T02:50:09Z) - FedFMC: Sequential Efficient Federated Learning on Non-iid Data [0.0]
FedFMC (Fork-Consolidate-Merge) is a method that forks devices into updating different global models then merges and consolidates separate models into one.
We show that FedFMC substantially improves upon earlier approaches to non-iid data in the federated learning context without using a globally shared subset of data nor increase communication costs.
arXiv Detail & Related papers (2020-06-19T02:36:17Z) - Ternary Compression for Communication-Efficient Federated Learning [17.97683428517896]
Federated learning provides a potential solution to privacy-preserving and secure machine learning.
We propose a ternary federated averaging protocol (T-FedAvg) to reduce the upstream and downstream communication of federated learning systems.
Our results show that the proposed T-FedAvg is effective in reducing communication costs and can even achieve slightly better performance on non-IID data.
arXiv Detail & Related papers (2020-03-07T11:55:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.