Fugu-MT 論文翻訳(概要): Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

論文の概要: Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

arxiv url: http://arxiv.org/abs/2509.21892v1
Date: Fri, 26 Sep 2025 05:29:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-29 20:57:54.201428
Title: Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
Title（参考訳）: Elastic MoE: Mixture-of-Expertsの推論と時間のスケーラビリティを解き放つ
Authors: Naibin Gu, Zhenyu Zhang, Yuchen Feng, Yilong Chen, Peng Fu, Zheng Lin, Shuohuan Wang, Yu Sun, Hua Wu, Weiping Wang, Haifeng Wang,
Abstract要約: Mixture-of-Experts (MoE)モデルは通常、トレーニングと推論の両方でアクティベートされた専門家の数を$k$に修正する。新たなトレーニングフレームワークであるElastic Mixture-of-Experts(EMoE)を導入しました。
参考スコア（独自算出の注目度）: 43.63398524449102
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) models typically fix the number of activated experts $k$ at both training and inference. Intuitively, activating more experts at inference $k'$ (where $k'> k$) means engaging a larger set of model parameters for the computation and thus is expected to improve performance. However, contrary to this intuition, we find the scaling range to be so narrow that performance begins to degrade rapidly after only a slight increase in the number of experts. Further investigation reveals that this degradation stems from a lack of learned collaboration among experts. To address this, we introduce Elastic Mixture-of-Experts (EMoE), a novel training framework that enables MoE models to scale the number of activated experts at inference without incurring additional training overhead. By simultaneously training experts to collaborate in diverse combinations and encouraging the router for high-quality selections, EMoE ensures robust performance across computational budgets at inference. We conduct extensive experiments on various MoE settings. Our results show that EMoE significantly expands the effective performance-scaling range, extending it to as much as 2-3$\times$ the training-time $k$, while also pushing the model's peak performance to a higher level.
Abstract（参考訳）: Mixture-of-Experts (MoE)モデルは通常、トレーニングと推論の両方でアクティベートされた専門家の数を$k$に修正する。直感的には、$k'> k$($k'> k$)の推論でより多くの専門家を活性化することは、計算のためにより大きなモデルパラメータのセットをエンゲージすることであり、それによってパフォーマンスが向上することが期待されている。しかし、この直観とは対照的に、スケーリング範囲が非常に狭くなり、専門家の数がわずかに増えただけで、パフォーマンスが急速に低下し始めます。さらなる調査により、この劣化は専門家の間での学習的なコラボレーションの欠如に起因することが判明した。これに対処するために,新たなトレーニングフレームワークであるElastic Mixture-of-Experts(EMoE)を導入する。さまざまな組み合わせで協力し、高品質な選択のためにルータを奨励する専門家を同時に訓練することで、EMoEは推論時の計算予算にわたって堅牢なパフォーマンスを保証する。各種のMoE設定について広範な実験を行った。以上の結果から,EMoE は有効性能スケーリング範囲を大幅に拡大し,最大 2-3$\times$ the training-time $k$ まで拡張するとともに,モデルのピーク性能をより高いレベルに押し上げることができた。

論文の概要: Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

関連論文リスト