Fugu-MT 論文翻訳(概要): Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

論文の概要: Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

arxiv url: http://arxiv.org/abs/2604.01622v1
Date: Thu, 02 Apr 2026 05:01:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.366512
Title: Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models
Title（参考訳）: 拡散言語モデルにおける適応計算を可能にするエキスパートコースルーティング
Authors: Shuibai Zhang, Caspian Zhuang, Chihan Cui, Zhihan Yang, Fred Zhangzhi Peng, Yanxin Zhang, Haoyue Bai, Zack Jia, Yang Zhou, Guanhua Chen, Ming Liu,
Abstract要約: 専門家選択ルーティングはトークン選択ルーティングよりも拡散言語モデルに適していることを示す。我々は、時間に依存した専門家の能力を導入します。 DLM MoEモデルでは,ECルーティングが優れたパラダイムとして確立されている。
参考スコア（独自算出の注目度）: 11.628969213956502
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models (DLMs) enable parallel, non-autoregressive text generation, yet existing DLM mixture-of-experts (MoE) models inherit token-choice (TC) routing from autoregressive systems, leading to load imbalance and rigid computation allocation. We show that expert-choice (EC) routing is a better fit for DLMs: it provides deterministic load balancing by design, yielding higher throughput and faster convergence than TC. Building on the property that EC capacity is externally controllable, we introduce timestep-dependent expert capacity, which varies expert allocation according to the denoising step. We find that allocating more capacity to low-mask-ratio steps consistently achieves the best performance under matched FLOPs, and provide a mechanistic explanation: tokens in low-mask-ratio contexts exhibit an order-of-magnitude higher learning efficiency, so concentrating compute on these steps yields the largest marginal return. Finally, we show that existing pretrained TC DLMs can be retrofitted to EC by replacing only the router, achieving faster convergence and improved accuracy across diverse downstream tasks. Together, these results establish EC routing as a superior paradigm for DLM MoE models and demonstrate that computation in DLMs can be treated as an adaptive policy rather than a fixed architectural constant. Code is available at https://github.com/zhangshuibai/EC-DLM.
Abstract（参考訳）: 拡散言語モデル(DLM)は、並列で非自己回帰的なテキスト生成を可能にするが、既存のDLMミックスオブエキスパート(MoE)モデルは自動回帰システムからトークン選択(TC)ルーティングを継承し、負荷不均衡と厳密な計算割り当てをもたらす。我々は、専門家選択(EC)ルーティングがDLMに適していることを示し、設計による決定論的ロードバランシングを提供し、TCKよりも高いスループットと高速な収束をもたらす。外部制御可能なECキャパシティを基盤として,時間に依存した専門家キャパシティを導入する。低マスク比のステップにより多くのキャパシティを割り当てることは、一致したFLOPのベストパフォーマンスを一貫して達成し、メカニカルな説明を提供する。最後に、既存の訓練済みのTC DLMを、ルータのみを置き換え、より高速な収束を実現し、様々な下流タスクにまたがる精度を向上させることにより、ECに適合させることができることを示す。これらの結果はDLM MoEモデルに優れたパラダイムとしてECルーティングを確立し、DLMの計算を固定アーキテクチャ定数ではなく適応ポリシーとして扱うことができることを示した。コードはhttps://github.com/zhangshuibai/EC-DLM.comで入手できる。

論文の概要: Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models

関連論文リスト