Fugu-MT 論文翻訳(概要): Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

論文の概要: Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

arxiv url: http://arxiv.org/abs/2602.00476v1
Date: Sat, 31 Jan 2026 03:00:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-03 19:28:33.211162
Title: Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly
Title（参考訳）: 拡散型LMは最適充填長を瞬時に近似できる
Authors: Hengchang Liu, Zhao Yang, Bing Su,
Abstract要約: 拡散言語モデル(DLM)は、自然に埋め込むのに適した双方向生成フレームワークを提供する。本稿では, DLM には, 適切な埋蔵長さを検出できる固有の能力があることを明らかにする。トレーニング不要な textbfCAL により DLM は正規復号化前の効率的な探索により最適な長さを近似することができる。
参考スコア（独自算出の注目度）: 16.576341843767352
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion language models (DLMs) provide a bidirectional generation framework naturally suited for infilling, yet their performance is constrained by the pre-specified infilling length. In this paper, we reveal that DLMs possess an inherent ability to discover the correct infilling length. We identify two key statistical phenomena in the first-step denoising confidence: a local \textit{Oracle Peak} that emerges near the ground-truth length and a systematic \textit{Length Bias} that often obscures this signal. By leveraging this signal and calibrating the bias, our training-free method \textbf{CAL} (\textbf{C}alibrated \textbf{A}daptive \textbf{L}ength) enables DLMs to approximate the optimal length through an efficient search before formal decoding. Empirical evaluations demonstrate that CAL improves Pass@1 by up to 47.7\% over fixed-length baselines and 40.5\% over chat-based adaptive methods in code infilling, while boosting BLEU-2 and ROUGE-L by up to 8.5\% and 9.9\% in text infilling. These results demonstrate that CAL paves the way for robust DLM infilling without requiring any specialized training. Code is available at https://github.com/NiuHechang/Calibrated_Adaptive_Length.
Abstract（参考訳）: 拡散言語モデル(DLMs)は、自然に埋め込むのに適した双方向生成フレームワークを提供するが、その性能はあらかじめ指定された埋め込む長さによって制約される。本稿では, DLM には, 適切な埋蔵長さを検出できる固有の能力があることを明らかにする。第一段階の信頼性を示す2つの重要な統計現象を同定する: 局所的な textit{Oracle Peak} と、このシグナルをしばしば隠蔽する体系的な \textit{Length Bias} である。この信号を利用してバイアスを補正することにより、トレーニング不要な方法である \textbf{CAL} (\textbf{C}alibrated \textbf{A}daptive \textbf{L}ength) により、DLM は形式復号化前の効率的な探索により最適な長さを近似することができる。実証的な評価では、CALはPass@1を固定長のベースラインで47.7倍、コード埋め込みでチャットベースの適応メソッドで40.5倍、BLEU-2とROUGE-Lで最大8.5倍、テキスト埋め込みで9.9倍改善している。これらの結果から, CALは専門的な訓練を必要とせず, 堅牢なDLM埋設の道を開くことが示唆された。コードはhttps://github.com/NiuHechang/Calibrated_Adaptive_Lengthで公開されている。

論文の概要: Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly

関連論文リスト