Fugu-MT 論文翻訳(概要): Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

論文の概要: Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

arxiv url: http://arxiv.org/abs/2602.06412v1
Date: Fri, 06 Feb 2026 06:08:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-09 22:18:26.254497
Title: Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding
Title（参考訳）: Masked Diffusion-LMデコードにおける収束トークンの停止計算
Authors: Daisuke Oba, Danushka Bollegala, Masahiro Kaneko, Naoaki Okazaki,
Abstract要約: Masked Diffusion Language Modelsは、トークンを徐々に解き放つ反復サンプリングを通じてシーケンスを生成する。そこで我々は,アンマスキーク位置の後方が段差で安定すると,その位置をロックする,SureLockを提案する。これにより、定位当たりの計算コストは$O(N2d)$から$O(MNd)$に削減され、$N$はシーケンス長、$M$はアンロックされたトークン位置の数、$d$はモデル寸法となる。
参考スコア（独自算出の注目度）: 46.61138996670135
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Masked Diffusion Language Models generate sequences via iterative sampling that progressively unmasks tokens. However, they still recompute the attention and feed-forward blocks for every token position at every step -- even when many unmasked tokens are essentially fixed, resulting in substantial waste in compute. We propose SureLock: when the posterior at an unmasked position has stabilized across steps (our sure condition), we lock that position -- thereafter skipping its query projection and feed-forward sublayers -- while caching its attention keys and values so other positions can continue to attend to it. This reduces the dominant per-iteration computational cost from $O(N^2d)$ to $O(MNd)$ where $N$ is the sequence length, $M$ is the number of unlocked token positions, and $d$ is the model dimension. In practice, $M$ decreases as the iteration progresses, yielding substantial savings. On LLaDA-8B, SureLock reduces algorithmic FLOPs by 30--50% relative to the same sampler without locking, while maintaining comparable generation quality. We also provide a theoretical analysis to justify the design rationale of SureLock: monitoring only the local KL at the lock step suffices to bound the deviation in final token probabilities. Our code will be available at https://daioba.github.io/surelock .
Abstract（参考訳）: Masked Diffusion Language Modelsは、トークンを徐々に解き放つ反復サンプリングを通じてシーケンスを生成する。しかしながら、すべてのステップにおいて、すべてのトークン位置に対する注意とフィードフォワードブロック -- 事実上多くのアンマストトークンが固定されても、計算にかなりの無駄が発生します。我々は、未マストトークンの後方がステップ全体にわたって安定化している場合(確実な条件下で)、その位置をロックします -- その後、クエリプロジェクションとフィードフォワードサブレイヤをスキップします -- をキャッシュしながら、アテンションキーと値をキャッシュし、他の位置がそれに対応するようにします。これにより、支配的な設定単位の計算コストが$O(N^2d)$から$O(MNd)$に削減され、$N$はシーケンス長、$M$はアンロックされたトークン位置の数、$d$はモデル寸法となる。実際には、イテレーションが進むにつれて$M$は減少し、かなりの節約になる。 LLaDA-8Bでは、SureLockはアルゴリズムのFLOPをロックなしで同じサンプリング器と比較して30-50%削減し、同等の生成品質を維持している。また,SureLockの設計原理を正当化するための理論的解析も提供する。ロックステップにおける局所KLのみを監視して,最終的なトークン確率の偏差を限定する。私たちのコードはhttps://daioba.github.io/surelock.orgで公開されます。

論文の概要: Stopping Computation for Converged Tokens in Masked Diffusion-LM Decoding

関連論文リスト