Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
- URL: http://arxiv.org/abs/2506.05977v1
- Date: Fri, 06 Jun 2025 10:59:11 GMT
- Title: Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning
- Authors: Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang,
- Abstract summary: Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments.<n>We propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy.<n>We show that FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
- Score: 25.121545962121907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To tackle these issues, we propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy. Specifically, FedBE expands trainable blocks within the model architecture, structurally separating newly learned task-specific knowledge from the original pre-trained representations. Additionally, FedBE dynamically assigns these trainable blocks to clients based on their data distributions and computational capabilities. This enables the framework to better accommodate heterogeneous federated environments and enhances the generalization ability of the model.Extensive experiments show that compared with existing federated fine-tuning methods, FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
Related papers
- FedP3E: Privacy-Preserving Prototype Exchange for Non-IID IoT Malware Detection in Cross-Silo Federated Learning [5.7494612007431805]
We propose FedP3E, a novel FL framework that supports indirect cross-client representation sharing while maintaining data privacy.<n>We evaluate FedP3E on the N-BaIoT dataset under realistic cross-silo scenarios with varying degrees of data imbalance.
arXiv Detail & Related papers (2025-07-09T20:07:35Z) - FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors [50.131271229165165]
Federated Learning (FL) has emerged as a promising framework for distributed machine learning.<n>Data heterogeneity resulting from differences across user behaviors, preferences, and device characteristics poses a significant challenge for federated learning.<n>We propose Adaptive Weight Aggregation (FedAWA), a novel method that adaptively adjusts aggregation weights based on client vectors during the learning process.
arXiv Detail & Related papers (2025-03-20T04:49:40Z) - Invariant Federated Learning for Edge Intelligence: Mitigating Heterogeneity and Asynchrony via Exit Strategy and Invariant Penalty [10.54196990763149]
This paper provides an invariant federated learning system for resource-constrained edge intelligence.<n>It can mitigate the impact of heterogeneous and asynchrony via exit strategy and invariant penalty.<n>It shows our system can enhance In-Distribution performance and outperform the state-of-the-art algorithm in Out-Of-Distribution generalization.
arXiv Detail & Related papers (2025-03-08T10:47:27Z) - FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling [12.067872131025231]
Federated Learning (FL) enables collaborative model training across distributed clients without data sharing.<n>Current methods use dynamic pruning to improve efficiency by periodically adjusting sparse model topologies while maintaining sparsity.<n>We propose Federated Robust pruning via Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models.
arXiv Detail & Related papers (2025-01-31T13:26:22Z) - Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation [1.519321208145928]
Federated Learning (FL) offers a promising framework for individuals aiming to collaboratively develop a shared model.<n> variations in data distribution among clients can profoundly affect FL methodologies, primarily due to instabilities in the aggregation process.<n>We propose a novel FL framework, combining individual parameter pruning and regularization techniques to improve the robustness of individual clients' models to aggregate.
arXiv Detail & Related papers (2024-12-19T16:22:37Z) - Client Contribution Normalization for Enhanced Federated Learning [4.726250115737579]
Mobile devices, including smartphones and laptops, generate decentralized and heterogeneous data.
Federated Learning (FL) offers a promising alternative by enabling collaborative training of a global model across decentralized devices without data sharing.
This paper focuses on data-dependent heterogeneity in FL and proposes a novel approach leveraging mean latent representations extracted from locally trained models.
arXiv Detail & Related papers (2024-11-10T04:03:09Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Addressing Data Heterogeneity in Federated Learning with Adaptive Normalization-Free Feature Recalibration [1.33512912917221]
Federated learning is a decentralized collaborative training paradigm that preserves stakeholders' data ownership while improving performance and generalization.
We propose Adaptive Normalization-free Feature Recalibration (ANFR), an architecture-level approach that combines weight standardization and channel attention.
arXiv Detail & Related papers (2024-10-02T20:16:56Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - Fine-tuning Global Model via Data-Free Knowledge Distillation for
Non-IID Federated Learning [86.59588262014456]
Federated Learning (FL) is an emerging distributed learning paradigm under privacy constraint.
We propose a data-free knowledge distillation method to fine-tune the global model in the server (FedFTG)
Our FedFTG significantly outperforms the state-of-the-art (SOTA) FL algorithms and can serve as a strong plugin for enhancing FedAvg, FedProx, FedDyn, and SCAFFOLD.
arXiv Detail & Related papers (2022-03-17T11:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.