Fugu-MT 論文翻訳(概要): Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

論文の概要: Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

arxiv url: http://arxiv.org/abs/2605.06885v1
Date: Thu, 07 May 2026 19:35:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-11 19:43:38.583583
Title: Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment
Title（参考訳）: Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs through Representation Alignment
Authors: Fred Zhangzhi Peng, Alexis Fox, Anru R. Zhang, Alexander Tong,
Abstract要約: 拡散言語モデル(DLM)は、最近、標準自己回帰(AR)モデルを補完する機能を実証した。我々は,AR-to-DLM変換中に,次点予測によって学習した内部表現幾何を明示的に保存できるかを問う。本稿では,事前訓練されたARモデルから表現を再利用するために,双方向マスク拡散モデルを適用する表現アライメント対象であるREPR-ALIGNを紹介する。
参考スコア（独自算出の注目度）: 46.75006425771645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion language models (DLMs) have recently demonstrated capabilities that complement standard autoregressive (AR) models, particularly in non-sequential generation and bidirectional editing. Although recent work has shown that pretrained autoregressive checkpoints can be converted into diffusion language models, existing recipes primarily transfer parameters through continued denoising training with objective- and attention-level modifications. We instead ask whether the internal representation geometry learned by next-token prediction can be explicitly preserved during AR-to-DLM conversion. We hypothesize that much of the semantic structure learned by AR pretraining can transfer across generation orders, and thus DLM training should be viewed as relearning the decoding path rather than relearning language representations. To investigate this, we introduce REPR-ALIGN, a representation alignment objective that adapts a bidirectional masked diffusion model to reuse representations from a pretrained AR model of identical architecture. Concretely, we align the hidden states of the DLM to the frozen AR model at every layer using cosine similarity, while optimizing the standard masked denoising objective. This simple alignment, with no adapters and no architectural changes beyond the attention mask, yields up to 4x training acceleration in our setting and is particularly effective in low-data regimes. Our results suggest that linguistic representations can transfer across generation order, and that representation alignment provides a simple and effective technique for training diffusion language models. Code is available at https://github.com/pengzhangzhi/Open-dLLM.
Abstract（参考訳）: 拡散言語モデル(DLM)は、最近、標準的な自己回帰モデル(AR)モデルを補完する機能、特に非逐次生成と双方向編集の機能を実証した。近年の研究では、事前学習された自己回帰チェックポイントが拡散言語モデルに変換できることが示されているが、既存のレシピは主に、客観的および注意レベルの修正による継続的な認知訓練を通じてパラメータを伝達する。そこで我々は,AR-to-DLM変換中に,次点予測によって学習した内部表現幾何を明示的に保存できるかを問う。我々は、ARプレトレーニングによって学習される意味構造の多くは、生成順序をまたいで伝達できるので、DLMトレーニングは、言語表現を再学習するのではなく、復号パスを学習するものとして見なすべきである、と仮定する。そこで我々は,同じアーキテクチャの事前訓練されたARモデルから表現を再利用するために,双方向マスク拡散モデルを適用する表現アライメント対象REPR-ALIGNを提案する。具体的には,DLMの隠蔽状態をコサイン類似性を用いて各層における凍結ARモデルに整列し,標準的なマスク付き遮蔽目標を最適化する。このシンプルなアライメントは、アダプタがなく、アテンションマスク以外のアーキテクチャ上の変更もなく、私たちの設定では最大4倍のトレーニングアクセラレーションをもたらします。この結果から,言語表現は生成順序をまたいで伝達可能であることが示唆され,表現アライメントは拡散言語モデルを訓練するための単純かつ効果的な手法を提供する。コードはhttps://github.com/pengzhangzhi/Open-dLLMで入手できる。

論文の概要: Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

関連論文リスト