Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts
- URL: http://arxiv.org/abs/2506.00965v1
- Date: Sun, 01 Jun 2025 11:24:43 GMT
- Title: Unlocking Personalized Knowledge in Federated Large Language Model: The Power of Mixture of Experts
- Authors: Fan Liu, Bikang Pan, Zhongyi Wang, Xi Yao, Xiaoying Tang, Jingya Wang, Ye Shi,
- Abstract summary: We propose FLEx, a novel federated learning framework specifically tailored for large language models (LLMs)<n>FLEx efficiently personalizes by pruning the global MoE model to keep only one expert per client, and employs an adaptive gating mechanism to reintegrate personalized experts into the pre-trained MoE layers.<n>These personalized experts are trained with local data and stored locally on each client, while the shared modules are aggregated globally.
- Score: 14.713865726974761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Mixture of Experts (MoE) architecture has emerged as a prominent strategy for scaling large language models (LLMs), effectively leveraging sparse activation and facilitating task-specific personalization. However, current federated learning (FL) approaches are primarily designed for dense models, making them unable to directly exploit the sparsity inherent in MoE architectures. Treating MoE models as dense networks in federated scenarios results in excessive communication overhead and computational costs, undermining the potential for personalized knowledge sharing. To address these challenges, we propose FLEx (Federated LLMs with Personalized Experts), a novel federated learning framework explicitly tailored for MoE-based LLMs. FLEx efficiently personalizes by pruning the global MoE model to keep only one expert per client, and employs an adaptive gating mechanism to reintegrate these personalized experts into the pre-trained MoE layers, ensuring the original backbone architecture remains unchanged. These personalized experts are trained with local data and stored locally on each client, while the shared modules are aggregated globally. Extensive evaluations on diverse instruction-based datasets under non-IID conditions consistently demonstrate that FLEx outperforms existing federated baselines. Our code is available at https://anonymous.4open.science/r/FLEx-8F12.
Related papers
- PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated Learning [14.681194790227085]
Federated learning (FL) has gained widespread attention for its privacy-preserving and collaborative learning capabilities.<n> Personalized federated learning addresses this issue by dividing the model into a globally shared part and a locally private part.<n>We propose PM-MoE architecture, which integrates a mixture of personalized modules and an energy-based personalized modules denoising.
arXiv Detail & Related papers (2025-02-01T07:20:21Z) - Personalized Federated Fine-Tuning for LLMs via Data-Driven Heterogeneous Model Architectures [15.645254436094055]
Federated Learning (FL) enables collaborative fine-tuning of Large Language Models without accessing raw data.<n>We propose FedAMoLE, a lightweight personalized FL framework that enables data-driven heterogeneous model architectures.<n> Experiments show that FedAMoLE improves client-side performance by an average of 5.14% compared to existing approaches.
arXiv Detail & Related papers (2024-11-28T13:20:38Z) - Personalized Hierarchical Split Federated Learning in Wireless Networks [24.664469755746463]
We propose a personalized hierarchical split federated learning (PHSFL) algorithm that is specially designed to achieve better personalization performance.<n>We first perform extensive theoretical analysis to understand the impact of model splitting and hierarchical model aggregations on the global model.<n>Once the global model is trained, we fine-tune each client to obtain the personalized models.
arXiv Detail & Related papers (2024-11-09T02:41:53Z) - FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts [4.412721048192925]
We present FedMoE, the efficient personalized Federated Learning framework to address data heterogeneity.
FedMoE is composed of two fine-tuning stages. In the first stage, FedMoE simplifies the problem by conducting a search based on observed activation patterns.
In the second stage, these submodels are distributed to clients for further training and returned for server aggregating.
arXiv Detail & Related papers (2024-08-21T03:16:12Z) - Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts.
Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data.
Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z) - Multi-Level Additive Modeling for Structured Non-IID Federated Learning [54.53672323071204]
We train models organized in a multi-level structure, called Multi-level Additive Models (MAM)'', for better knowledge-sharing across heterogeneous clients.
In federated MAM (FeMAM), each client is assigned to at most one model per level and its personalized prediction sums up the outputs of models assigned to it across all levels.
Experiments show that FeMAM surpasses existing clustered FL and personalized FL methods in various non-IID settings.
arXiv Detail & Related papers (2024-05-26T07:54:53Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes [49.22075916259368]
In some real-world applications, data samples are usually distributed on local devices.
In this paper, we focus on a special kind of Non-I.I.D. scene where clients own incomplete classes.
Our proposed algorithm named MAP could simultaneously achieve the aggregation and personalization goals in FL.
arXiv Detail & Related papers (2024-04-14T12:22:42Z) - FedJETs: Efficient Just-In-Time Personalization with Federated Mixture
of Experts [48.78037006856208]
FedJETs is a novel solution by using a Mixture-of-Experts (MoE) framework within a Federated Learning (FL) setup.
Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s)
Our approach can improve accuracy up to 18% in state of the art FL settings, while maintaining competitive zero-shot performance.
arXiv Detail & Related papers (2023-06-14T15:47:52Z) - Heterogeneous Ensemble Knowledge Transfer for Training Large Models in
Federated Learning [22.310090483499035]
Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server.
Most existing FL algorithms require models of identical architecture to be deployed across the clients and server.
We propose a novel ensemble knowledge transfer method named Fed-ET in which small models are trained on clients, and used to train a larger model at the server.
arXiv Detail & Related papers (2022-04-27T05:18:32Z) - Efficient Split-Mix Federated Learning for On-Demand and In-Situ
Customization [107.72786199113183]
Federated learning (FL) provides a distributed learning framework for multiple participants to collaborate learning without sharing raw data.
In this paper, we propose a novel Split-Mix FL strategy for heterogeneous participants that, once training is done, provides in-situ customization of model sizes and robustness.
arXiv Detail & Related papers (2022-03-18T04:58:34Z) - Federated Mutual Learning [65.46254760557073]
Federated Mutual Leaning (FML) allows clients training a generalized model collaboratively and a personalized model independently.
The experiments show that FML can achieve better performance than alternatives in typical Federated learning setting.
arXiv Detail & Related papers (2020-06-27T09:35:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.