Fugu-MT 論文翻訳(概要): ExFusion: Efficient Transformer Training via Multi-Experts Fusion

論文の概要: ExFusion: Efficient Transformer Training via Multi-Experts Fusion

arxiv url: http://arxiv.org/abs/2603.27965v1
Date: Mon, 30 Mar 2026 02:40:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.197215
Title: ExFusion: Efficient Transformer Training via Multi-Experts Fusion
Title（参考訳）: 排ガス:多核融合による効率的な変圧器訓練
Authors: Jiacheng Ruan, Daize Dong, Xiaoye Qu, Tong Zhu, Ting Liu, Yuzhuo Fu, Yu Cheng, Suncheng Xiang,
Abstract要約: Mixture-of-Experts (MoE)モデルは、密集したアーキテクチャの容量を増やすことで性能を大幅に向上させる。 MoEモデルを直接トレーニングするには、かなりの計算リソースが必要で、パラメータの保存とデプロイに余分なオーバーヘッドが伴う。本稿では,マルチエキスパート融合によるトランスフォーマートレーニングの効率化を図る,ExFusionと呼ばれる新しい事前学習手法を提案する。
参考スコア（独自算出の注目度）: 44.08657544416735
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixture-of-Experts (MoE) models substantially improve performance by increasing the capacity of dense architectures. However, directly training MoE models requires considerable computational resources and introduces extra overhead in parameter storage and deployment. Therefore, it is critical to develop an approach that leverages the multi-expert capability of MoE to enhance performance while incurring minimal additional cost. To this end, we propose a novel pre-training approach, termed ExFusion, which improves the efficiency of Transformer training through multi-expert fusion. Specifically, during the initialization phase, ExFusion upcycles the feed-forward network (FFN) of the Transformer into a multi-expert configuration, where each expert is assigned a weight for later parameter fusion. During training, these weights allow multiple experts to be fused into a single unified expert equivalent to the original FFN, which is subsequently used for forward computation. As a result, ExFusion introduces multi-expert characteristics into the training process while incurring only marginal computational cost compared to standard dense training. After training, the learned weights are used to integrate multi-experts into a single unified expert, thereby eliminating additional overhead in storage and deployment. Extensive experiments on a variety of computer vision and natural language processing tasks demonstrate the effectiveness of the proposed method.
Abstract（参考訳）: Mixture-of-Experts (MoE)モデルは、密集したアーキテクチャの容量を増やすことで性能を大幅に向上させる。しかし、直接MoEモデルを訓練するにはかなりの計算資源が必要であり、パラメータの保存と展開に余分なオーバーヘッドが伴う。したがって、MoEのマルチエキスパート機能を活用して、最小限の追加コストを発生させながら性能を向上させるアプローチを開発することが重要である。そこで本研究では,マルチエキスパートフュージョンによるトランスフォーマートレーニングの効率化を目的とした,ExFusionと呼ばれる新しい事前学習手法を提案する。具体的には、初期化フェーズにおいて、ExFusionはTransformerのフィードフォワードネットワーク(FFN)をマルチエキスパート構成にリサイクルする。トレーニング中、これらの重み付けにより、複数の専門家が元のFFNと同等の単一の統一された専門家に融合し、後にフォワード計算に使用される。その結果、ExFusionはトレーニングプロセスにマルチエキスパート特性を導入し、通常の高密度トレーニングと比較して限界計算コストのみを発生させる。トレーニング後、学習したウェイトを使用して、複数の専門家を単一の統一されたエキスパートに統合することで、ストレージとデプロイメントのオーバーヘッドを増やすことができる。様々なコンピュータビジョンと自然言語処理タスクに関する大規模な実験により,提案手法の有効性が示された。

論文の概要: ExFusion: Efficient Transformer Training via Multi-Experts Fusion

関連論文リスト