Related papers: FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge

URL: http://arxiv.org/abs/2508.18663v1
Date: Tue, 26 Aug 2025 04:09:18 GMT
Title: FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
Authors: Gang Hu, Yinglei Teng, Pengfei Wu, Nan Wang,
Abstract summary: Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT)<n>We propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters.<n>MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.
Score: 7.976167864455345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As FMs drive progress toward Artificial General Intelligence (AGI), fine-tuning them under privacy and resource constraints has become increasingly critical particularly when highquality training data resides on distributed edge devices. Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT), which enables collaborative model adaptation without sharing raw data. Recent approaches incorporate Parameter-Efficient Fine-Tuning (PEFT) techniques such as Low Rank Adaptation (LoRA) to reduce computational overhead. However, LoRA-based FFT faces two major limitations in heterogeneous FL environments: structural incompatibility across clients with varying LoRA configurations and limited adaptability to non-IID data distributions, which hinders convergence and generalization. To address these challenges, we propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters. Each client trains a lightweight gating network to selectively activate a personalized subset of experts, enabling fine-grained adaptation to local resource budgets while preserving aggregation compatibility. To further combat the expert load imbalance caused by device and data heterogeneity, we introduce a heterogeneity-aware auxiliary loss that dynamically regularizes the routing distribution to ensure expert diversity and balanced utilization. Extensive experiments spanning both IID and non-IID conditions demonstrate that FFT MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.

Related papers

Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z)
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISM achieves a dynamics-aware framework that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge.<n>Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.
arXiv Detail & Related papers (2026-01-12T05:43:20Z)
Closer to Reality: Practical Semi-Supervised Federated Learning for Foundation Model Adaptation [56.36237936346563]
Foundation models (FMs) exhibit remarkable generalization but require adaptation to downstream tasks.<n>Due to data privacy regulations, cloud-based FMs cannot directly access private edge data.<n>We introduce Practical Semi-Supervised Federated Learning (PSSFL), where edge devices hold only unlabeled, low-resolution data.<n>Our work paves the way for scalable and privacy-preserving FM adaptation in federated scenarios.
arXiv Detail & Related papers (2025-08-22T17:47:02Z)
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE [21.860699562235776]
FLAME is a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture.<n>It retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client.<n>It tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme.
arXiv Detail & Related papers (2025-06-19T21:02:19Z)
Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning [25.121545962121907]
Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments.<n>We propose FedBE, a novel FedFT framework that integrates an adaptive transformer block expansion mechanism with a dynamic trainable-block allocation strategy.<n>We show that FedBE achieves 12-74% higher accuracy retention on general tasks after fine-tuning and a model convergence acceleration ratio of 1.9-3.1x without degrading the accuracy of downstream tasks.
arXiv Detail & Related papers (2025-06-06T10:59:11Z)
Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.<n>Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z)
An Architecture Built for Federated Learning: Addressing Data Heterogeneity through Adaptive Normalization-Free Feature Recalibration [0.3481075494213406]
We propose Adaptive Normalization-free Feature Recalibration (ANFR) to combat heterogeneous data in Federated Learning (FL)<n>ANFR combines weight standardization and channel attention to produce learnable scaling factors for feature maps.<n>Experiments show ANFR consistently outperforms established baselines across various aggregation methods.
arXiv Detail & Related papers (2024-10-02T20:16:56Z)
Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning [9.198799314774437]
Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges.
arXiv Detail & Related papers (2023-09-30T04:31:53Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model. Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z)
FedFM: Anchor-based Feature Matching for Data Heterogeneity in Federated Learning [91.74206675452888]
We propose a novel method FedFM, which guides each client's features to match shared category-wise anchors. To achieve higher efficiency and flexibility, we propose a FedFM variant, called FedFM-Lite, where clients communicate with server with fewer synchronization times and communication bandwidth costs.
arXiv Detail & Related papers (2022-10-14T08:11:34Z)
Disentangled Federated Learning for Tackling Attributes Skew via Invariant Aggregation and Diversity Transferring [104.19414150171472]
Attributes skews the current federated learning (FL) frameworks from consistent optimization directions among the clients. We propose disentangled federated learning (DFL) to disentangle the domain-specific and cross-invariant attributes into two complementary branches. Experiments verify that DFL facilitates FL with higher performance, better interpretability, and faster convergence rate, compared with SOTA FL methods.
arXiv Detail & Related papers (2022-06-14T13:12:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.