Related papers: FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment

FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment

URL: http://arxiv.org/abs/2512.23070v1
Date: Sun, 28 Dec 2025 20:32:13 GMT
Title: FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment
Authors: Boyang Zhang, Xiaobing Chen, Songyang Zhang, Shuai Zhang, Xiangwei Zhou, Mingxuan Sun,
Abstract summary: Mixture-of-Experts (MoE) models enable scalable neural networks through conditional computation.<n>Our approach introduces client-expert fitness scores that quantify the expert suitability for local datasets through training feedback.<n>Our comprehensive experiments on three different datasets demonstrate the superior performance of the proposed FLEX-MoE.
Score: 38.27527504479237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mixture-of-Experts (MoE) models enable scalable neural networks through conditional computation. However, their deployment with federated learning (FL) faces two critical challenges: 1) resource-constrained edge devices cannot store full expert sets, and 2) non-IID data distributions cause severe expert load imbalance that degrades model performance. To this end, we propose \textbf{FLEX-MoE}, a novel federated MoE framework that jointly optimizes expert assignment and load balancing under limited client capacity. Specifically, our approach introduces client-expert fitness scores that quantify the expert suitability for local datasets through training feedback, and employs an optimization-based algorithm to maximize client-expert specialization while enforcing balanced expert utilization system-wide. Unlike existing greedy methods that focus solely on personalization while ignoring load imbalance, our FLEX-MoE is capable of addressing the expert utilization skew, which is particularly severe in FL settings with heterogeneous data. Our comprehensive experiments on three different datasets demonstrate the superior performance of the proposed FLEX-MoE, together with its ability to maintain balanced expert utilization across diverse resource-constrained scenarios.

Related papers

A Replicate-and-Quantize Strategy for Plug-and-Play Load Balancing of Sparse Mixture-of-Experts LLMs [64.8510381475827]
Sparse Mixture-of-Experts (SMoE) architectures are increasingly used to scale large language models efficiently.<n>SMoE models often suffer from severe load imbalance across experts, where a small subset of experts receives most tokens while others are underutilized.<n>We present a systematic analysis of expert routing during inference and identify three findings: (i) load imbalance persists and worsens with larger batch sizes, (ii) selection frequency does not reliably reflect expert importance, and (iii) overall expert workload and importance can be estimated using a small calibration set.
arXiv Detail & Related papers (2026-02-23T15:11:16Z)
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts [74.40169987564724]
Expert parallelism (EP) is designed to scale MoE models by distributing experts across multiple devices.<n>Under extreme imbalance, EP can funnel a disproportionate number of tokens to a small number of experts, leading to compute- and memory-bound failures.<n>We propose Least-Loaded Expert Parallelism (LLEP), a novel EP algorithm that dynamically reroutes excess tokens and associated expert parameters from overloaded devices to underutilized ones.
arXiv Detail & Related papers (2026-01-23T18:19:15Z)
HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts [26.55877320740609]
We propose HFedMoE, a heterogeneous MoE-based FL fine-tuning framework that customizes a subset of experts to each client.<n> HFedMoE identifies the expert importance based on its contributions to fine-tuning performance.<n>It then adaptively selects a subset of experts from an information bottleneck perspective to align with each client's computing budget.
arXiv Detail & Related papers (2026-01-02T05:56:11Z)
Federated Fine-Tuning of Sparsely-Activated Large Language Models on Resource-Constrained Devices [41.84571097603175]
Federated fine-tuning of large language models (LLMs) is challenging due to their massive computational requirements and the resource constraints of participants.<n>We propose FLUX, a system designed to enable fine-tuning of MoE-based LLMs across participants with constrained computing resources.<n>FLUX significantly outperforms existing methods, achieving up to 4.75X speedup in time-to-accuracy.
arXiv Detail & Related papers (2025-08-26T14:39:00Z)
FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge [7.976167864455345]
Federated Learning (FL) offers a compelling solution through Federated Fine-Tuning (FFT)<n>We propose FFT MoE, a novel FFT framework that replaces LoRA with sparse Mixture of Experts (MoE) adapters.<n>MoE consistently outperforms state of the art FFT baselines in generalization performance and training efficiency.
arXiv Detail & Related papers (2025-08-26T04:09:18Z)
FLAME: Towards Federated Fine-Tuning Large Language Models Through Adaptive SMoE [21.860699562235776]
FLAME is a novel federated learning framework based on the Sparse Mixture-of-Experts (SMoE) architecture.<n>It retains full (uncompressed) global LoRA matrices and achieves client-side adaptability by varying the number of activated experts per client.<n>It tackles these challenges through a lightweight rescaling mechanism and an activation-aware aggregation scheme.
arXiv Detail & Related papers (2025-06-19T21:02:19Z)
FLEx: Personalized Federated Learning for Mixture-of-Experts LLMs via Expert Grafting [40.23842164423827]
Federated instruction tuning of large language models (LLMs) is challenged by significant data heterogeneity across clients.<n>We propose FLEx, a novel framework that leverages pretrained MoE-based LLMs for efficient personalization.<n>For personalization, we introduce a novel expert grafting mechanism that leverages dynamic sparsity to construct a client-specific expert from selected components of pretrained experts.
arXiv Detail & Related papers (2025-06-01T11:24:43Z)
Convergence Rates for Softmax Gating Mixture of Experts [78.3687645289918]
Mixture of experts (MoE) has emerged as an effective framework to advance the efficiency and scalability of machine learning models.<n>Central to the success of MoE is an adaptive softmax gating mechanism which takes responsibility for determining the relevance of each expert to a given input and then dynamically assigning experts their respective weights.<n>We perform a convergence analysis of parameter estimation and expert estimation under the MoE equipped with the standard softmax gating or its variants, including a dense-to-sparse gating and a hierarchical softmax gating.
arXiv Detail & Related papers (2025-03-05T06:11:24Z)
Federated Fine-Tuning of LLMs: Framework Comparison and Research Directions [59.5243730853157]
Federated learning (FL) provides a privacy-preserving solution for fine-tuning pre-trained large language models (LLMs) using distributed private datasets.<n>This article conducts a comparative analysis of three advanced federated LLM (FedLLM) frameworks that integrate knowledge distillation (KD) and split learning (SL) to mitigate these issues.
arXiv Detail & Related papers (2025-01-08T11:37:06Z)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts [63.67734699877724]
MoE++ is a general and heterogeneous MoE framework that integrates both Feed-Forward Network(FFN) and zero-computation experts. MoE++ achieves better performance while delivering 1.1-2.1x expert forward throughput compared to a vanilla MoE model of the same size.
arXiv Detail & Related papers (2024-10-09T18:01:27Z)
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings [41.98633628526484]
Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components.<n>Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can allocate domain-specific expertise.
arXiv Detail & Related papers (2023-06-14T15:47:52Z)
MoEC: Mixture of Expert Clusters [93.63738535295866]
Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation.
arXiv Detail & Related papers (2022-07-19T06:09:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.