Fugu-MT 論文翻訳(概要): Mixture of Heterogeneous Grouped Experts for Language Modeling

論文の概要: Mixture of Heterogeneous Grouped Experts for Language Modeling

arxiv url: http://arxiv.org/abs/2604.23108v2
Date: Tue, 28 Apr 2026 02:47:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-29 14:06:43.826641
Title: Mixture of Heterogeneous Grouped Experts for Language Modeling
Title（参考訳）: 言語モデリングのための異種グループエキスパートの混合
Authors: Zhicheng Ma, Xiang Liu, Zhaoxiang Liu, Ning Wang, Yi Shen, Kai Wang, Shuming Shi, Shiguo Lian,
Abstract要約: Mixture-of-Experts (MoE) に基づくLarge Language Models (LLM) は、産業アプリケーションにおいて、性能を効率的に拡張する能力において重要な要素である。標準的なMoEは、均一な専門家サイズを強制し、様々なトークンレベルの複雑さと計算コストの整合に失敗する剛性を生成する。そこで本稿では, 資源に配慮したフレキシブルな組み合わせを実現するための2段階のルーティング機構を導入する, 異種グループエキスパートの混合(MoHGE)を提案する。
参考スコア（独自算出の注目度）: 19.29654468661715
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert sizes,creating a rigidity that fails to align computational costs with varying token-level complexity. While heterogeneous expert architectures attempt to address this by diversifying expert sizes, they often suffer from significant system-level challenges, specifically unbalanced GPU utilization and inefficient parameter utilization, which hinder practical deployment. To bridge the gap between theoretical heterogeneity and robust industrial application, we propose Mixture of Heterogeneous Grouped Experts (MoHGE) which introduces a two-level routing mechanism to enable flexible, resource-aware expert combinations. To optimize inference efficiency, we propose a Group-Wise Auxiliary Loss, which dynamically steers tokens to the most parameter-efficient expert groups based on task difficulty. To address the critical deployment challenge of GPU load balancing, we introduce an All-size Group-decoupling Allocation strategy coupled with an Intra-Group Experts Auxiliary Loss. These mechanisms collectively ensure uniform computation distribution across GPUs. Extensive evaluations demonstrate that MoHGE matches the performance of MoE architectures while reducing the total parameters by approximately 20% and maintaining balanced GPU utilization. Our work establishes a scalable paradigm for resource-efficient MoE design, offering a practical solution for optimizing inference costs in real-world scenarios. The code is publicly available at https://github.com/UnicomAI/MoHGE.
Abstract（参考訳）: Mixture-of-Experts (MoE) に基づくLarge Language Models (LLM) は、産業アプリケーションにおいて、性能を効率的に拡張する能力において重要な要素である。しかし、標準的なMoEは均一な専門家サイズを強制し、計算コストを異なるトークンレベルの複雑さに合わせるのに失敗する剛性を生み出す。ヘテロジニアスなエキスパートアーキテクチャは、専門家のサイズを多様化することによってこの問題に対処しようとするが、それらはしばしば重要なシステムレベルの課題、特にGPU利用の不均衡と非効率なパラメータ利用に悩まされ、実用的なデプロイメントを妨げている。理論的不均一性とロバストな産業応用のギャップを埋めるため, フレキシブル・リソース・アウェア・エキスパートの組み合わせを実現するための2段階のルーティング機構を導入したMixture of Heterogeneous Grouped Experts (MoHGE)を提案する。推論効率を最適化するために,タスク難易度に基づくパラメータ効率の高いエキスパートグループに対して,トークンを動的に操るグループワイズ補助損失を提案する。 GPUロードバランシングの致命的な展開課題に対処するため,全サイズグループ分離アロケーション戦略と,グループ内エキスパートの補助的損失の併用を導入する。これらのメカニズムは、GPU全体の均一な計算分布を保証する。大規模な評価では、MoHGEはMoEアーキテクチャのパフォーマンスと一致し、合計パラメータを約20%削減し、バランスの取れたGPU使用率を維持している。我々の研究は、リソース効率のよいMoE設計のためのスケーラブルなパラダイムを確立し、現実のシナリオにおける推論コストを最適化するための実用的なソリューションを提供します。コードはhttps://github.com/UnicomAI/MoHGEで公開されている。

論文の概要: Mixture of Heterogeneous Grouped Experts for Language Modeling

関連論文リスト