The Rise of Sparse Mixture-of-Experts:A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications
- URL: http://arxiv.org/abs/2602.08019v1
- Date: Sun, 08 Feb 2026 15:39:10 GMT
- Title: The Rise of Sparse Mixture-of-Experts:A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications
- Authors: Dong Pan, Bingtao Li, Yongsheng Zheng, Jiren Ma, Victor Fei,
- Abstract summary: Mixture of Experts(MoE) architecture has evolved as a powerful approach for scaling deep learning models to more parameters with comparable cost.<n>MoE model only activate a subset of experts based on a routing network.<n>MoE not only enhance downstream applications such as natural language processing, computer vision, and multimodal in various horizontal domains, but also exhibit broad applicability across vertical domains.
- Score: 2.4589447945128455
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The sparse Mixture of Experts(MoE) architecture has evolved as a powerful approach for scaling deep learning models to more parameters with comparable computation cost. As an important branch of large language model(LLM), MoE model only activate a subset of experts based on a routing network. This sparse conditional computation mechanism significantly improves computational efficiency, paving a promising path for greater scalability and cost-efficiency. It not only enhance downstream applications such as natural language processing, computer vision, and multimodal in various horizontal domains, but also exhibit broad applicability across vertical domains. Despite the growing popularity and application of MoE models across various domains, there lacks a systematic exploration of recent advancements of MoE in many important fields. Existing surveys on MoE suffer from limitations such as lack coverage or none extensively exploration of key areas. This survey seeks to fill these gaps. In this paper, Firstly, we examine the foundational principles of MoE, with an in-depth exploration of its core components-the routing network and expert network. Subsequently, we extend beyond the centralized paradigm to the decentralized paradigm, which unlocks the immense untapped potential of decentralized infrastructure, enables democratization of MoE development for broader communities, and delivers greater scalability and cost-efficiency. Furthermore we focus on exploring its vertical domain applications. Finally, we also identify key challenges and promising future research directions. To the best of our knowledge, this survey is currently the most comprehensive review in the field of MoE. We aim for this article to serve as a valuable resource for both researchers and practitioners, enabling them to navigate and stay up-to-date with the latest advancements.
Related papers
- Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks [108.15756345836901]
We provide a comprehensive review of multimodal spatial reasoning tasks with large models.<n>We review advances in embodied AI, including vision-language navigation and action models.<n>We consider emerging modalities such as audio and egocentric video, which contribute to novel spatial understanding through new sensors.
arXiv Detail & Related papers (2025-10-29T17:55:43Z) - Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [96.1872246747684]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z) - General-Reasoner: Advancing LLM Reasoning Across All Domains [64.70599911897595]
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs)<n>We propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains.<n>We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc.
arXiv Detail & Related papers (2025-05-20T17:41:33Z) - Taming the Titans: A Survey of Efficient LLM Inference Serving [33.65474967178607]
Large Language Models (LLMs) for Generative AI have achieved remarkable progress.<n>The substantial memory overhead caused by their vast number of parameters, combined with the high computational demands of the attention mechanism, poses significant challenges.<n>Recent advancements, driven by groundbreaking research, have significantly accelerated progress in this field.
arXiv Detail & Related papers (2025-04-28T12:14:02Z) - Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains [92.36624674516553]
Reinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs)<n>We investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education.<n>We utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications.
arXiv Detail & Related papers (2025-03-31T08:22:49Z) - A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications [7.414857515253022]
We introduce the basic design of MoE, including gating functions, expert networks, routing mechanisms, training strategies, and system design.<n>We then explore the algorithm design of MoE in important machine learning paradigms such as continual learning, meta-learning, multi-task learning, and reinforcement learning.
arXiv Detail & Related papers (2025-03-10T10:08:55Z) - Low-Rank Adaptation for Foundation Models: A Comprehensive Review [56.341827242332194]
Low-Rank Adaptation (LoRA) has emerged as a highly promising approach for mitigating these challenges.<n>This survey provides the first comprehensive review of LoRA techniques beyond large Language Models to general foundation models.
arXiv Detail & Related papers (2024-12-31T09:38:55Z) - Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z) - MoDEM: Mixture of Domain Expert Models [23.846823652305027]
We propose a novel approach to enhancing the performance and efficiency of large language models (LLMs)
We introduce a system that utilizes a BERT-based router to direct incoming prompts to the most appropriate domain expert model.
Our research demonstrates that this approach can significantly outperform general-purpose models of comparable size.
arXiv Detail & Related papers (2024-10-09T23:52:54Z) - A Survey on Mixture of Experts in Large Language Models [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead.<n>Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE.<n>This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z) - A Survey of Resource-efficient LLM and Multimodal Foundation Models [22.23967603206849]
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and multimodal models, are revolutionizing the entire machine learning lifecycle.
However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources.
This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects.
arXiv Detail & Related papers (2024-01-16T03:35:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.