Related papers: A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications

URL: http://arxiv.org/abs/2503.07137v3
Date: Fri, 18 Apr 2025 02:53:01 GMT
Title: A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Authors: Siyuan Mu, Sen Lin,
Abstract summary: We introduce the basic design of MoE, including gating functions, expert networks, routing mechanisms, training strategies, and system design.<n>We then explore the algorithm design of MoE in important machine learning paradigms such as continual learning, meta-learning, multi-task learning, and reinforcement learning.
Score: 7.414857515253022
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Artificial intelligence (AI) has achieved astonishing successes in many domains, especially with the recent breakthroughs in the development of foundational large models. These large models, leveraging their extensive training data, provide versatile solutions for a wide range of downstream tasks. However, as modern datasets become increasingly diverse and complex, the development of large AI models faces two major challenges: (1) the enormous consumption of computational resources and deployment difficulties, and (2) the difficulty in fitting heterogeneous and complex data, which limits the usability of the models. Mixture of Experts (MoE) models has recently attracted much attention in addressing these challenges, by dynamically selecting and activating the most relevant sub-models to process input data. It has been shown that MoEs can significantly improve model performance and efficiency with fewer resources, particularly excelling in handling large-scale, multimodal data. Given the tremendous potential MoE has demonstrated across various domains, it is urgent to provide a comprehensive summary of recent advancements of MoEs in many important fields. Existing surveys on MoE have their limitations, e.g., being outdated or lacking discussion on certain key areas, and we aim to address these gaps. In this paper, we first introduce the basic design of MoE, including gating functions, expert networks, routing mechanisms, training strategies, and system design. We then explore the algorithm design of MoE in important machine learning paradigms such as continual learning, meta-learning, multi-task learning, and reinforcement learning. Additionally, we summarize theoretical studies aimed at understanding MoE and review its applications in computer vision and natural language processing. Finally, we discuss promising future research directions.

Related papers

The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval [49.587042083937426]
We propose the first EReL@MIR workshop at the Web Conference 2025, inviting participants to explore novel solutions. This workshop aims to provide a platform for both academic and industry researchers to engage in discussions, share insights, and foster collaboration.
arXiv Detail & Related papers (2025-04-21T01:10:59Z)
Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions [16.78870612041548]
Embodied multimodal large models (EMLMs) have gained significant attention in recent years due to their potential to bridge the gap between perception, cognition, and action in complex, real-world environments. This comprehensive review explores the development of such models, including Large Language Models (LLMs), Large Vision Models (LVMs), and other models.
arXiv Detail & Related papers (2025-02-21T09:41:27Z)
Mixture of Experts (MoE): A Big Data Perspective [34.785207813971134]
Mixture of experts (MoE) has shown excellent performance and broad application prospects.<n>This paper systematically elaborates on the principles, techniques, and applications of MoE in big data processing.
arXiv Detail & Related papers (2025-01-18T20:17:31Z)
HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities. It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z)
From Efficient Multimodal Models to World Models: A Survey [28.780451336834876]
Multimodal Large Models (MLMs) are becoming a significant research focus combining powerful language models with multimodal learning. This review explores the latest developments and challenges in large instructions, emphasizing their potential in achieving artificial general intelligence.
arXiv Detail & Related papers (2024-06-27T15:36:43Z)
A Survey on Mixture of Experts [11.801185267119298]
The mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal overhead. MoE has emerged as an effective method for substantially scaling up model capacity with minimal overhead. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of MoE.
arXiv Detail & Related papers (2024-06-26T16:34:33Z)
On the Challenges and Opportunities in Generative AI [157.96723998647363]
We argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains.<n>We aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.
arXiv Detail & Related papers (2024-02-28T15:19:33Z)
A Survey of Resource-efficient LLM and Multimodal Foundation Models [22.23967603206849]
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and multimodal models, are revolutionizing the entire machine learning lifecycle. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects.
arXiv Detail & Related papers (2024-01-16T03:35:26Z)
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning [53.90744622542961]
Reasoning in mathematical domains remains a significant challenge for small language models (LMs) We introduce a new method that exploits existing mathematical problem datasets with diverse annotation styles. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches.
arXiv Detail & Related papers (2023-07-16T05:41:53Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations [119.1953397679783]
We focus on advancing the state-of-the-art in interpreting multimodal models. Our proposed approach, DIME, enables accurate and fine-grained analysis of multimodal models.
arXiv Detail & Related papers (2022-03-03T20:52:47Z)
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning [112.51498431119616]
This paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. A single model, HighMMT, scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas.
arXiv Detail & Related papers (2022-03-02T18:56:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.