Fugu-MT 論文翻訳(概要): Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

論文の概要: Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

arxiv url: http://arxiv.org/abs/2605.16941v1
Date: Sat, 16 May 2026 11:27:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:47.309586
Title: Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers
Title（参考訳）: ロールアウトとロールバック:拡散LDMは自力で効率のよい教師になる
Authors: Fanqin Zeng, Feng Hong, Geng Yu, Huangjie Zheng, Xiaofeng Cao, Ya Zhang, Bo Han, Yanfeng Wang, Jiangchao Yao,
Abstract要約: Wide-In, Narrow-Out (WINO) は、リボッキング可能な並列生成を可能にするトレーニング不要の復号アルゴリズムである。 WINO+は、WINOが生成した検証された認知軌道をモデルパラメータに注入し、トレーニングを効率的な推論と整合させる。 LLaDAとMMaDAの実験では、WINOは品質と効率の両方を改善し、WINO+はこの進歩をさらに強化している。
参考スコア（独自算出の注目度）: 76.15132587294862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still face a severe quality-speed trade-off: accelerating decoding by revealing multiple tokens often causes substantial quality degradation. We attribute this dilemma to a train-inference mismatch amplified by irreversible decoding. While training reconstructs tokens from randomly corrupted states, efficient inference requires an adaptive denoising order, where easier tokens are revealed earlier and context-dependent ones are deferred. This view motivates two complementary methods: an inference-time method that makes parallel decoding revokable, and a training-time extension that distills the reliable order exposed by this revokable process. Accordingly, we first propose Wide-In, Narrow-Out (WINO), a training-free decoding algorithm that enables revokable parallel generation. WINO aggressively drafts multiple tokens, verifies generated tokens with enriched global context, and re-masks unreliable ones for later refinement. Building on this discovered order, we further introduce WINO+, which injects the verified denoising trajectories produced by WINO into model parameters, aligning training with efficient inference. Experiments on LLaDA and MMaDA show that WINO improves both quality and efficiency, while WINO+ further strengthens this progression. On GSM8K, WINO improves accuracy from 73.24% to 75.82% with a 6.10x step reduction, and WINO+ further achieves 76.58% with a 6.83x reduction. On Flickr30K, WINO+ reaches a 16.22x step reduction with improved CIDEr. These results demonstrate that DLLMs can serve as their own efficiency teachers by first discovering reliable denoising orders through revokable decoding and then learning to follow them for faster generation. Code is available at https://github.com/Feng-Hong/WINO-DLLM/tree/WINO-plus.
Abstract（参考訳）: Diffusion Large Language Models (DLLM) は高速な並列生成を約束するが、オープンソースのDLLMは依然として厳しい品質と速度のトレードオフに直面している。我々はこのジレンマを、不可逆復号によって増幅された列車間ミスマッチとみなす。トレーニングはランダムに破損した状態からトークンを再構成するが、効率的な推論には適応的な推論順序が必要である。この見解は2つの補完的手法を動機付けている: 並列復号化を可能にする推論時法と、この復号化プロセスによって露呈される信頼性の高い順序を蒸留する訓練時拡張である。そこで,我々はまず,リボクタブル並列生成が可能なトレーニングフリーデコードアルゴリズムであるワイドイン,ナローアウト(WINO)を提案する。 WINOは積極的に複数のトークンをドラフトし、豊富なグローバルコンテキストで生成されたトークンを検証し、後の改良のために信頼できないトークンを再マスクする。そこで本研究では,WINO+をモデルパラメータに導入し,トレーニングを効率の良い推論と整合させる手法を提案する。 LLaDAとMMaDAの実験では、WINOは品質と効率の両方を改善し、WINO+はこの進歩をさらに強化している。 GSM8Kでは、WINO+の精度は73.24%から75.82%に6.10倍、WINO+は76.58%に6.83倍に向上した。 Flickr30Kでは、WINO+はCIDErの改良により16.22倍のステップダウンを達成した。これらの結果から, DLLMは, 退行可能な復号化によって信頼度の高い復号化命令を最初に発見し, より高速な世代に追従する学習を行うことで, 自己効率の教師として機能できることが示唆された。コードはhttps://github.com/Feng-Hong/WINO-DLLM/tree/WINO-plusで公開されている。

論文の概要: Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

関連論文リスト