Fugu-MT 論文翻訳(概要): Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

論文の概要: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

arxiv url: http://arxiv.org/abs/2512.01252v1
Date: Mon, 01 Dec 2025 03:52:31 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-02 19:46:34.67775
Title: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
Title（参考訳）: 拡散混合実験モデルの効率的な訓練:実践的準備
Authors: Yahui Liu, Yang Yue, Jingyuan Zhang, Chenxi Sun, Yang Zhou, Wencong Zeng, Ruiming Tang, Guorui Zhou,
Abstract要約: Diffusion MoEモデルに対する最近の取り組みは、主により洗練されたルーティングメカニズムの開発に焦点を当てている。大規模言語モデル(LLM)で確立されたMoE設計パラダイムに着想を得て,効率的な拡散MoEモデルを構築する上で重要なアーキテクチャ要素のセットを特定する。本稿では,潜在空間拡散フレームワークと画素空間拡散フレームワークの両方に効率よく適用可能な新しいアーキテクチャを提案する。
参考スコア（独自算出の注目度）: 51.26601054313749
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent efforts on Diffusion Mixture-of-Experts (MoE) models have primarily focused on developing more sophisticated routing mechanisms. However, we observe that the underlying architectural configuration space remains markedly under-explored. Inspired by the MoE design paradigms established in large language models (LLMs), we identify a set of crucial architectural factors for building effective Diffusion MoE models--including DeepSeek-style expert modules, alternative intermediate widths, varying expert counts, and enhanced attention positional encodings. Our systematic study reveals that carefully tuning these configurations is essential for unlocking the full potential of Diffusion MoE models, often yielding gains that exceed those achieved by routing innovations alone. Through extensive experiments, we present novel architectures that can be efficiently applied to both latent and pixel-space diffusion frameworks, which provide a practical and efficient training recipe that enables Diffusion MoE models to surpass strong baselines while using equal or fewer activated parameters. All code and models are publicly available at: https://github.com/yhlleo/EfficientMoE.
Abstract（参考訳）: 拡散混合(MoE)モデルに関する最近の研究は、主により洗練されたルーティング機構の開発に重点を置いている。しかし、基礎となるアーキテクチャ構成空間は明らかに未探索のままである。大規模言語モデル(LLM)で確立されたMoE設計パラダイムに着想を得て,Diffusion MoEモデルを構築する上で重要なアーキテクチャ要素のセットを特定する。我々の系統的な研究は、これらの構成を慎重に調整することが拡散型MOEモデルの潜在能力を最大限に活用するために不可欠であることを示した。広範にわたる実験を通じて,潜伏空間拡散フレームワークと画素空間拡散フレームワークの両方に効率よく適用可能な新しいアーキテクチャを提案する。すべてのコードとモデルは、https://github.com/yhlleo/EfficientMoE.comで公開されている。

論文の概要: Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

関連論文リスト