FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
- URL: http://arxiv.org/abs/2506.16600v2
- Date: Mon, 14 Jul 2025 21:49:53 GMT
- Title: FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE
- Authors: Khiem Le, Tuan Tran, Ting Hua, Nitesh V. Chawla,
- Abstract summary: FLAME is a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture.<n>It retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client.<n>It tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme.
- Score: 21.860699562235776
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing resource-adaptive LoRA federated fine-tuning methods enable clients to fine-tune models using compressed versions of global LoRA matrices, in order to accommodate various compute resources across clients. This compression requirement will lead to suboptimal performance due to information loss. To address this, we propose FLAME, a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture. Unlike prior approaches, FLAME retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client. However, incorporating SMoE into federated learning introduces unique challenges, specifically, the mismatch in output magnitude from partial expert activation and the imbalance in expert training quality across clients. FLAME tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme. Empirical results across diverse computational settings demonstrate that FLAME consistently outperforms existing methods, providing a robust and effective solution for resource-adaptive federated learning.
Related papers
- Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection [53.45696787935487]
Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes.<n>In real-world service-oriented deployments, data generated by heterogeneous users, devices, and application scenarios are inherently non-IID.<n>We propose FLood, a novel FL framework inspired by out-of-distribution (OOD) detection.
arXiv Detail & Related papers (2026-02-01T05:54:59Z) - HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts [26.55877320740609]
We propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client.<n> HFedMoE identifies the expert importance based on its contributions to fine-tuning performance.<n>It then adaptively selects a subset of experts from an information bottleneck perspective to align with each client's computing budget.
arXiv Detail & Related papers (2026-01-02T05:56:11Z) - Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z) - FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge [7.976167864455345]
Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT)<n>We propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters.<n>MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.
arXiv Detail & Related papers (2025-08-26T04:09:18Z) - Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning [0.9176056742068811]
Federated learning (FL) enables distributed training with private client data.<n>Current ensemble-based FL methods fall short in capturing diversity of model predictions.<n>We propose textbfSHEFL, a global ensemble-based FL framework suited for clients with diverse computational capacities.
arXiv Detail & Related papers (2025-08-12T01:40:46Z) - Federated Sketching LoRA: A Flexible Framework for Heterogeneous Collaborative Fine-Tuning of LLMs [37.03583502049329]
Fine-tuning large language models (LLMs) on resource-constrained clients remains a challenging problem.<n>Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with client model sizes and data scarcity.<n>We propose federated sketching LoRA, which leverages a sketching mechanism to enable clients to update submatrices of global LoRA modules maintained by the server.
arXiv Detail & Related papers (2025-01-31T18:44:35Z) - Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.<n>We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z) - Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.<n>Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z) - LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement [5.162783756846019]
Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning.<n>Low-Rank Adaptation (LoRA) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters.<n>LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-11-22T14:19:01Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts [4.412721048192925]
We present FedMoE, the efficient personalized Federated Learning framework to address data heterogeneity.
FedMoE is composed of two fine-tuning stages. In the first stage, FedMoE simplifies the problem by conducting a search based on observed activation patterns.
In the second stage, these submodels are distributed to clients for further training and returned for server aggregating.
arXiv Detail & Related papers (2024-08-21T03:16:12Z) - Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training [21.89214794178211]
In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space.
We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training.
Our empirical study shows that EmbracingFL consistently achieves high accuracy as like all clients are strong, outperforming the state-of-the-art width reduction methods.
arXiv Detail & Related papers (2024-06-21T13:19:29Z) - An Element-Wise Weights Aggregation Method for Federated Learning [11.9232569348563]
This paper introduces an innovative Element-Wise Weights Aggregation Method for Federated Learning (EWWA-FL)
EWWA-FL aggregates local weights to the global model at the level of individual elements, allowing each participating client to make element-wise contributions to the learning process.
By taking into account the unique dataset characteristics of each client, EWWA-FL enhances the robustness of the global model to different datasets.
arXiv Detail & Related papers (2024-04-24T15:16:06Z) - Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources [31.041608465716575]
Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs)
This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning.
arXiv Detail & Related papers (2024-02-18T08:32:59Z) - Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection [19.284989473603627]
We propose a novel Balanced Modality Selection framework for multi-modal learning (MFL)
We show that local training with a certain single modality may contribute more to the global model than training with all local modalities.
Our experiments on audio-visual, colored-gray, and front-back datasets showcase the superiority of BMSFed over baselines.
arXiv Detail & Related papers (2023-12-31T05:37:27Z) - Beyond ADMM: A Unified Client-variance-reduced Adaptive Federated
Learning Framework [82.36466358313025]
We propose a primal-dual FL algorithm, termed FedVRA, that allows one to adaptively control the variance-reduction level and biasness of the global model.
Experiments based on (semi-supervised) image classification tasks demonstrate superiority of FedVRA over the existing schemes.
arXiv Detail & Related papers (2022-12-03T03:27:51Z) - FL Games: A Federated Learning Framework for Distribution Shifts [71.98708418753786]
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server.
We propose FL GAMES, a game-theoretic framework for federated learning that learns causal features that are invariant across clients.
arXiv Detail & Related papers (2022-10-31T22:59:03Z) - Efficient Split-Mix Federated Learning for On-Demand and In-Situ
Customization [107.72786199113183]
Federated learning (FL) provides a distributed learning framework for multiple participants to collaborate learning without sharing raw data.
In this paper, we propose a novel Split-Mix FL strategy for heterogeneous participants that, once training is done, provides in-situ customization of model sizes and robustness.
arXiv Detail & Related papers (2022-03-18T04:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.