Fugu-MT 論文翻訳(概要): LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

論文の概要: LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

arxiv url: http://arxiv.org/abs/2606.16908v1
Date: Mon, 15 Jun 2026 16:15:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:34.757619
Title: LESS Is More: Mutual-Stability Sampling for Diffusion Language Models
Title（参考訳）: LESS:拡散言語モデルのための相互安定性サンプリング
Authors: Amr Mohamed, Guokan Shang, Michalis Vazirgiannis,
Abstract要約: 拡散大言語モデル (dLLMs) は自己回帰デコードに代わる有望な代替手段を提供する。我々は、トークンのコミットメントをオンライン停止問題として扱う、トレーニング不要でモデルに依存しない適応型サンプリングであるtextscLESSを提示する。 textscLESSは、固定予算デコードよりも72.1%のリバースステップを減らしながら、強力なトレーニングなし適応型サンプリング器よりも平均精度を向上する。
参考スコア（独自算出の注目度）: 23.94639050546374
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion large language models (dLLMs) offer a promising alternative to autoregressive decoding by iteratively refining masked sequences, enabling parallel token updates and bidirectional conditioning. Their practical efficiency, however, is limited by sampling procedures that execute a fixed number of reverse denoising steps selected before decoding, spending computation on already-stable positions and sometimes committing unstable ones too early. We present \textsc{LESS}, a training-free, model-agnostic adaptive sampler that treats token commitment as an online stopping problem. \textsc{LESS} implements mutual-stability sampling through a joint stability rule that makes a masked position eligible for unmasking only when its top-1 prediction has high confidence, its top-1 token persists across recent reverse steps, and its predictive distribution is stable under top-$K$ inter-step Jensen--Shannon divergence. We evaluate \textsc{LESS} on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B, covering full-sequence diffusion and semi-autoregressive blockwise sampling regimes, across seven benchmarks spanning general knowledge, math, and code. \textsc{LESS} improves average accuracy over strong training-free adaptive samplers while using $72.1\%$ fewer reverse steps than fixed-budget decoding. Since each reverse step requires a Transformer forward pass, these step-count reductions translate into fewer forward evaluations, lower measured wall-clock latency, and lower estimated inference compute.
Abstract（参考訳）: 拡散大言語モデル(dLLMs)は、マスク付きシーケンスを反復的に精錬することで、自動回帰デコードに代わる有望な代替手段を提供する。しかし、それらの実用的効率は、復号前に選択された数個の逆復号化ステップを実行し、既に安定している位置で計算に費やし、不安定な箇所を早めにコミットするサンプリング手順によって制限される。本稿では,トークンのコミットメントをオンライン停止問題として扱うトレーニングフリーで,モデルに依存しない適応型サンプリングシステムであるtextsc{LESS}について述べる。 textsc{LESS} は、そのトップ-1予測が高い信頼度を持つ場合にのみマスクされた位置をアンマキングできるような共同安定規則により相互安定サンプリングを実装し、そのトップ-1トークンは最近の逆ステップにまたがって持続し、その予測分布はトップ$K$ステップのジェンセン-シャノン分岐の下で安定である。本研究では,Dream-7B,LLaDA-8B,LLaDA-1.5-8B 上の \textsc{LESS} の評価を行い,一般知識,数学,コードにまたがる7つのベンチマークにおいて,全列拡散と半自己回帰的ブロックワイドサンプリングシステムについて検討した。 \textsc{LESS} は、固定予算デコードよりも72.1\%$少ないリバースステップを使用しながら、強いトレーニングのない適応型サンプリングよりも平均精度を向上させる。各逆ステップはトランスフォーマーフォワードパスを必要とするため、これらのステップカウントの削減は、より少ないフォワード評価、低いウォールクロックレイテンシ、低い推定推論計算に変換される。

論文の概要: LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

関連論文リスト