CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition
- URL: http://arxiv.org/abs/2402.02526v1
- Date: Sun, 4 Feb 2024 15:17:09 GMT
- Title: CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition
- Authors: Quang Pham, Giang Do, Huy Nguyen, TrungTin Nguyen, Chenghao Liu, Mina
Sartipi, Binh T. Nguyen, Savitha Ramasamy, Xiaoli Li, Steven Hoi, Nhat Ho
- Abstract summary: Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width.
We propose a competition mechanism to address this fundamental challenge of representation collapse.
By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator.
- Score: 52.2034494666179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sparse mixture of experts (SMoE) offers an appealing solution to scale up the
model complexity beyond the mean of increasing the network's depth or width.
However, effective training of SMoE has proven to be challenging due to the
representation collapse issue, which causes parameter redundancy and limited
representation potentials. In this work, we propose a competition mechanism to
address this fundamental challenge of representation collapse. By routing
inputs only to experts with the highest neural response, we show that, under
mild assumptions, competition enjoys the same convergence rate as the optimal
estimator. We further propose CompeteSMoE, an effective and efficient algorithm
to train large language models by deploying a simple router that predicts the
competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains
from the competition routing policy while having low computation overheads. Our
extensive empirical evaluations on two transformer architectures and a wide
range of tasks demonstrate the efficacy, robustness, and scalability of
CompeteSMoE compared to state-of-the-art SMoE strategies.
Related papers
- SimSMoE: Solving Representational Collapse via Similarity Measure [34.20340688374905]
Sparse mixture of experts (SMoE) have emerged as an effective approach for scaling large language models while keeping a constant computational cost.
We present Similarity-based Sparse Mixture of Experts (SimSMoE), a novel similarity of neural network algorithm, that guarantees a solution to the representation collapse issue.
arXiv Detail & Related papers (2024-06-22T16:10:45Z) - SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts [49.01990048827639]
We introduce SEER-MoE, a framework for reducing both the memory footprint and compute requirements of pre-trained MoE models.
The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss.
Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.
arXiv Detail & Related papers (2024-04-07T22:13:43Z) - Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer [59.43462055143123]
The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning.
In this study, we shed light on the homogeneous representation problem, wherein experts in the MoE fail to specialize and lack diversity.
We propose an alternating training strategy that encourages each expert to update in a direction to the subspace spanned by other experts.
arXiv Detail & Related papers (2023-10-15T07:20:28Z) - Benchmarking Robustness and Generalization in Multi-Agent Systems: A
Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions.
This competition targets robustness and generalization in multi-agent systems.
We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z) - Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks.
We develop an algorithm for learning an ensemble with maximum margin.
We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z) - A portfolio-based analysis method for competition results [0.8680676599607126]
I will describe a portfolio-based analysis method which can give complementary insights into the performance of participating solvers in a competition.
The method is demonstrated on the results of the MiniZinc Challenges and new insights gained from the portfolio viewpoint are presented.
arXiv Detail & Related papers (2022-05-30T20:20:45Z) - Continual Competitive Memory: A Neural System for Online Task-Free
Lifelong Learning [91.3755431537592]
We propose a novel form of unsupervised learning, continual competitive memory ( CCM)
The resulting neural system is shown to offer an effective approach for combating catastrophic forgetting in online continual classification problems.
We demonstrate that the proposed CCM system not only outperforms other competitive learning neural models but also yields performance that is competitive with several modern, state-of-the-art lifelong learning approaches.
arXiv Detail & Related papers (2021-06-24T20:12:17Z) - Towards robust and domain agnostic reinforcement learning competitions [12.731614722371376]
Reinforcement learning competitions have formed the basis for standard research benchmarks.
Despite this, a majority of challenges suffer from the same fundamental problems.
We present a new framework of competition design that promotes the development of algorithms that overcome these barriers.
arXiv Detail & Related papers (2021-06-07T16:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.