Fugu-MT 論文翻訳(概要): Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

論文の概要: Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

arxiv url: http://arxiv.org/abs/2605.11854v2
Date: Mon, 18 May 2026 06:27:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-19 17:57:45.70065
Title: Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models
Title（参考訳）: 自己拡散軌跡を考慮したボルツマンモデリング:拡散言語モデルにおける訓練・推論の相違をブリッジする
Authors: Kecheng Chen, Ziru Liu, Xijia Tao, Hui Liu, Yibing Liu, Xinyu Fu, Shi Wu, Suiyun Zhang, Dandan Tu, Lingpeng Kong, Rui Liu, Haoliang Li,
Abstract要約: 拡散言語モデル(DLM)は、より強力なグローバル認識と高い並列生成を提供する。標準負のエビデンス下界(NELBO)に基づく教師付き微調整後のDLMは非効率である。そこで本研究では,学習を推論の容易かつハードな構造に整合させる,自己蒸留軌道に基づくポストトレーニングフレームワークを提案する。
参考スコア（独自算出の注目度）: 65.89572755202245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive language models, offering stronger global awareness and highly parallel generation. However, post-training DLMs with standard Negative Evidence Lower Bound (NELBO)-based supervised fine-tuning remains inefficient: training reconstructs randomly masked tokens in a single step, whereas inference follows a confidence-guided, multi-step easy-to-hard denoising trajectory. Recent trajectory-based self-distillation methods exploit such inference trajectories mainly for sampling-step compression and acceleration, often improving decoding efficiency without substantially enhancing the model's underlying capability, and may even degrade performance under full diffusion decoding. In this work, we ask whether self-distilled trajectories can be used not merely for faster inference, but for genuine knowledge acquisition. Although these trajectories lie on the pretrained DLM's own distributional manifold and thus offer a potentially lower optimization barrier, we find that naively fine-tuning on them with standard NELBO objectives yields only marginal gains. To address this limitation, we propose \textbf{T}rajectory-\textbf{A}ligned optimization via \textbf{Bo}ltzmann \textbf{M}odeling (\textbf{TABOM}), a self-distilled trajectory-based post-training framework that aligns training with the easy-to-hard structure of inference. TABOM models the inference unmasking preference as a Boltzmann distribution over predictive entropies and derives a tractable pairwise ranking objective to align the model's certainty ordering with the observed decoding trajectory. Empirically, TABOM achieves substantial gains in new domains, expands the effective knowledge boundary of DLMs, and significantly mitigates catastrophic forgetting compared with standard SFT.
Abstract（参考訳）: 拡散言語モデル(DLM)は、最近、自己回帰型言語モデルに代わる有望な代替品として出現し、より強力なグローバル認識と高い並列生成を提供する。しかし、標準的な負の証拠(NELBO)に基づく教師付き微調整によるポストトレーニングDLMは、単一のステップでランダムにマスクされたトークンを再構成する一方、推論は、信頼誘導された、多段階の難解な復調軌道に従う。最近のトラジェクトリベースの自己蒸留法は、主にサンプリングステップの圧縮と加速のためにそのような推論軌道を利用しており、モデルの基本能力を大幅に向上させることなくデコード効率を向上し、完全な拡散復号化の下で性能を低下させる可能性がある。本研究では, 自己蒸留軌道を高速な推論だけでなく, 真の知識獲得にも利用できるかどうかを問う。これらの軌道は、事前訓練された DLM の分布多様体上にあり、潜在的に低い最適化障壁を与えるが、標準の NELBO の目標を鼻で微調整すると、限界ゲインしか得られない。この制限に対処するため、推論の容易な構造と整合する自己拡散軌道に基づく後学習フレームワークである \textbf{T}rajectory-\textbf{A}ligned optimization via \textbf{Bo}ltzmann \textbf{M}odeling (\textbf{TABOM})を提案する。 TABOMは予測エントロピー上のボルツマン分布として推論を解き放つ選好をモデル化し、観測された復号軌道とモデルの確実性順序を整合させるために、一対のランク付け目的を導出する。 TABOMは、DLMの効果的な知識境界を広げ、標準的なSFTと比較して破滅的な忘れを著しく軽減する。

論文の概要: Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

関連論文リスト