Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
- URL: http://arxiv.org/abs/2306.08586v3
- Date: Tue, 03 Jun 2025 16:07:58 GMT
- Title: Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
- Authors: Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao, Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Dimitrios Dimitriadis, Anastasios Kyrillidis,
- Abstract summary: Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components.<n>Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can allocate domain-specific expertise.
- Score: 41.98633628526484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components. Yet, understanding how expertise emerges through joint training of gating mechanisms and experts remains incomplete, especially in scenarios without clear task partitions. Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can dynamically allocate domain-specific expertise across multiple underlying data distributions. As an outcome of our framework, we develop an instance tailored specifically to decentralized training scenarios, introducing \textit{Dynamically Decentralized Orchestration of MoEs} or \texttt{DDOME}. \texttt{DDOME} leverages heterogeneity emerging from distributional shifts across decentralized data sources to specialize experts dynamically. By integrating a pretrained common expert to inform a gating function, \texttt{DDOME} achieves personalized expert subset selection on-the-fly, facilitating just-in-time personalization. We empirically validate \texttt{DDOME} within a Federated Learning (FL) context: \texttt{DDOME} attains from 4\% up to an 24\% accuracy improvement over state-of-the-art FL baselines in image and text classification tasks, while maintaining competitive zero-shot generalization capabilities. Furthermore, we provide theoretical insights confirming that the joint gating-experts training is critical for achieving meaningful expert specialization.
Related papers
- Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach [52.79991638077892]
This article highlights a critical, yet underexplored concept: the absence of robust quantitative strategies for dynamic client-expert alignment.<n>We propose a conceptual system design for intelligent client-expert alignment that incorporates dynamic fitness scoring, global expert load monitoring, and client capacity profiling.
arXiv Detail & Related papers (2025-07-08T05:30:37Z) - Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting [24.67373225584835]
Large Vision Language Models have demonstrated impressive versatile capabilities through extensive multimodal pre-training.<n>These models struggle with a fundamental dilemma: direct adaptation approaches that inject domain-specific knowledge often trigger catastrophic forgetting of foundational visual-linguistic abilities.<n>We introduce Structured Dialogue Fine-Tuning (SDFT), an effective approach that effectively injects domain-specific knowledge while minimizing catastrophic forgetting.
arXiv Detail & Related papers (2025-04-27T18:04:02Z) - Global Group Fairness in Federated Learning via Function Tracking [8.879649041822779]
We introduce a function-tracking scheme for the global fairness regularizer based on a Maximum Mean Discrepancy (MMD)<n>This scheme seamlessly integrates into most federated learning algorithms while preserving rigorous convergence guarantees.
arXiv Detail & Related papers (2025-03-19T12:42:37Z) - Client-Centric Federated Adaptive Optimization [78.30827455292827]
Federated Learning (FL) is a distributed learning paradigm where clients collaboratively train a model while keeping their own data private.
We propose Federated-Centric Adaptive Optimization, which is a class of novel federated optimization approaches.
arXiv Detail & Related papers (2025-01-17T04:00:50Z) - Flexible and Adaptable Summarization via Expertise Separation [59.26639426529827]
A proficient summarization model should exhibit both flexibility and adaptability.
We propose MoeSumm, a Mixture-of-Expert Summarization architecture.
Our model's distinct separation of general and domain-specific summarization abilities grants it with notable flexibility and adaptability.
arXiv Detail & Related papers (2024-06-08T05:31:19Z) - MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes [49.22075916259368]
In some real-world applications, data samples are usually distributed on local devices.
In this paper, we focus on a special kind of Non-I.I.D. scene where clients own incomplete classes.
Our proposed algorithm named MAP could simultaneously achieve the aggregation and personalization goals in FL.
arXiv Detail & Related papers (2024-04-14T12:22:42Z) - Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study [65.11303133775857]
Mixture-of-Experts (MoE) computation amalgamates predictions from several specialized sub-models (referred to as experts)
Sparse MoE selectively engages only a limited number, or even just one expert, significantly reducing overhead while empirically preserving, and sometimes even enhancing, performance.
arXiv Detail & Related papers (2024-03-26T05:48:02Z) - Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts [58.220879689376744]
Reinforcement learning (RL) is a powerful approach for acquiring a good-performing policy.
We propose textbfDiverse textbfSkill textbfLearning (Di-SkilL) for learning diverse skills.
We show on challenging robot simulation tasks that Di-SkilL can learn diverse and performant skills.
arXiv Detail & Related papers (2024-03-11T17:49:18Z) - Advocating for the Silent: Enhancing Federated Generalization for Non-Participating Clients [38.804196122833645]
This paper unveils an information-theoretic generalization framework for Federated Learning.<n>It quantifies generalization errors by evaluating the information entropy of local distributions.<n>Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies.
arXiv Detail & Related papers (2023-10-11T03:39:56Z) - Profit: Benchmarking Personalization and Robustness Trade-off in
Federated Prompt Tuning [40.16581292336117]
In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge.
It is critical to understand how to navigate this personalization vs robustness trade-off when designing federated systems.
arXiv Detail & Related papers (2023-10-06T23:46:33Z) - Efficient Personalized Federated Learning via Sparse Model-Adaptation [47.088124462925684]
Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data.
We propose pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models.
We show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T12:21:34Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - PGFed: Personalize Each Client's Global Objective for Federated Learning [7.810284483002312]
We propose a novel personalized FL framework that enables each client to personalize its own global objective.
To avoid massive (O(N2)) communication overhead and potential privacy leakage, each client's risk is estimated through a first-order approximation for other clients' adaptive risk aggregation.
Our experiments on four datasets under different federated settings show consistent improvements of PGFed over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-12-02T21:16:39Z) - Exploiting Personalized Invariance for Better Out-of-distribution
Generalization in Federated Learning [13.246981646250518]
This paper presents a general dual-regularized learning framework to explore the personalized invariance, compared with the exsiting personalized federated learning methods.
We show that our method is superior over the existing federated learning and invariant learning methods, in diverse out-of-distribution and Non-IID data cases.
arXiv Detail & Related papers (2022-11-21T08:17:03Z) - Personalizing or Not: Dynamically Personalized Federated Learning with
Incentives [37.42347737911428]
We propose personalized federated learning (FL) for learning personalized models without sharing private data.
We introduce the personalization rate, measured as the fraction of clients willing to train personalized models, into federated settings and propose DyPFL.
This technique incentivizes clients to participate in personalizing local models while allowing the adoption of the global model when it performs better.
arXiv Detail & Related papers (2022-08-12T09:51:20Z) - Orchestra: Unsupervised Federated Learning via Globally Consistent
Clustering [15.219936378115218]
Orchestra is a novel unsupervised federated learning technique that exploits the federation's hierarchy to orchestrate a distributed clustering task.
We show the algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe.
arXiv Detail & Related papers (2022-05-23T17:59:03Z) - PFA: Privacy-preserving Federated Adaptation for Effective Model
Personalization [6.66389628571674]
Federated learning (FL) has become a prevalent distributed machine learning paradigm with improved privacy.
This paper introduces a new concept called federated adaptation, targeting at adapting the trained model in a federated manner to achieve better personalization results.
We propose PFA, a framework to accomplish Privacy-preserving Federated Adaptation.
arXiv Detail & Related papers (2021-03-02T08:07:34Z) - Toward Understanding the Influence of Individual Clients in Federated
Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server.
We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z) - Personalized Federated Learning with First Order Model Optimization [76.81546598985159]
We propose an alternative to federated learning, where each client federates with other relevant clients to obtain a stronger model per client-specific objectives.
We do not assume knowledge of underlying data distributions or client similarities, and allow each client to optimize for arbitrary target distributions of interest.
Our method outperforms existing alternatives, while also enabling new features for personalized FL such as transfer outside of local data distributions.
arXiv Detail & Related papers (2020-12-15T19:30:29Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.