Fugu-MT 論文翻訳(概要): Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

論文の概要: Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

arxiv url: http://arxiv.org/abs/2606.15327v1
Date: Sat, 13 Jun 2026 14:41:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-16 16:21:33.357164
Title: Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design
Title（参考訳）: Semantic DLM+:遷移カーネル設計におけるバイアス分散トレードオフによる拡散言語モデルの改善
Authors: Keyue Jiang, Yuxiang Wang, Yanan Zhao, Xiang Yu, Qifang Zhao, Bohan Tang, Baojian Zhou, Yanghua Xiao, Lin Qu, Xiaoxiao Xu,
Abstract要約: 拡散言語モデル(DLM)は、自己回帰言語モデルの代替として、強力なスケーリング能力を示している。本稿では,この感度を一般化誤差の原理解析により検討し,3つの重要な因子を同定する。本稿では,サンプリング中にグローバルな遷移と意味頻度のペナルティを付加するSemDLM+を提案する。
参考スコア（独自算出の注目度）: 59.05127237532803
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Diffusion Language Models (DLMs) have demonstrated strong scaling capacity as alternatives to autoregressive language models. However, their performance is highly sensitive to the choice of transition kernels, and poorly designed kernels can lead to issues like training instability, slow convergence, and biased sampling. In this paper, we study this sensitivity through a principled analysis of generalization error and identify three critical factors: asymptotic bias (difficulty in approximating the posterior distribution), exposure bias (error propagation during sampling), and optimization variance induced by kernel dispersion. We further compare different transition kernels: masking diffusion yields sparse and easier posterior-approximation targets, while uniform diffusion provides stronger sampling-side repair but induces harder approximation. Motivated by this trade-off, we revisit a previously overlooked variant, semantic DLM (SemDLM), where the transition kernel corrupts tokens to neighborhoods that are semantically similar. Our theory suggests that SemDLM can serve as a plausible middle ground by reducing the posterior approximation difficulty of uniform diffusion while retaining repair ability. However, we find that SemDLM suffers from a semantic basin problem, where sampling repeatedly stays within a semantic region and produces low-diversity text. To address this, we propose SemDLM+, which adds a global transition and a semantic-frequency penalty during sampling. Experiments on LM1B and OpenWebText show that SemDLM+ improves training dynamics and achieves competitive language modeling and generation quality with satisfactory diversity.
Abstract（参考訳）: 拡散言語モデル(DLM)は、自己回帰言語モデルの代替として、強力なスケーリング能力を示している。しかし、それらの性能はトランジションカーネルの選択に非常に敏感であり、設計が不十分なカーネルは、トレーニングの不安定性、収束の遅い、サンプリングのバイアスといった問題を引き起こす可能性がある。本稿では, この感度について, 一般化誤差の原理的解析を通じて検討し, 漸近バイアス(後部分布の近似に難渋する), 露光バイアス(サンプリング時のエラー伝播), カーネル分散による最適化分散の3つの重要な要因を同定する。マスク拡散はスパースを生じ、後方近似は容易であるのに対し、均一拡散はサンプリング側をより強く修復するが、より難しい近似を誘導する。このトレードオフによって、我々は以前見過ごされたセマンティックDLM(SemDLM)を再考し、トランジションカーネルがトークンを意味的に類似した近傍に分解する。補修能力を維持しながら均一拡散の後方近似困難を低減し,SemDLMは可塑性中盤として機能する可能性が示唆された。しかし,SemDLMはセマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セマンティック・セそこで本研究では,サンプリング中にグローバルな遷移と意味頻度のペナルティを付加するSemDLM+を提案する。 LM1B と OpenWebText の実験から,SemDLM+ はトレーニングのダイナミクスを改善し,言語モデリングと生成品質を良好な多様性で実現している。

論文の概要: Semantic DLM+: Improving Diffusion Language Models through Bias-variance Trade-off in Transition Kernel Design

関連論文リスト