Fugu-MT 論文翻訳(概要): Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

論文の概要: Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

arxiv url: http://arxiv.org/abs/2605.14368v1
Date: Thu, 14 May 2026 04:47:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-15 21:45:34.625875
Title: Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
Title（参考訳）: 拡散は言語モデルに入るべきか?幾何誘導型隠れた状態置換
Authors: Injin Kong, Hyoungjoon Lee, Yohan Jo,
Abstract要約: DiHALは幾何学誘導拡散変換器ハイブリッドである。レイヤをスコアし、拡散フレンドリーな隠れ状態インターフェースを選択する。上層と元のLMヘッドを保持しながら、下層トランスフォーマープレフィックスを拡散ブリッジに置き換える。
参考スコア（独自算出の注目度）: 12.612647781309098
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continuous diffusion language models lag behind autoregressive transformers, partly because diffusion is applied in spaces poorly suited to language denoising and token recovery. We propose DiHAL, a geometry-guided diffusion-transformer hybrid that asks where diffusion should enter a pretrained transformer. DiHAL scores layers with geometry-based proxies, selects a diffusion-friendly hidden-state interface, and replaces the lower transformer prefix with a diffusion bridge while retaining the upper layers and original LM head. By reconstructing the selected-layer hidden state rather than tokens, DiHAL avoids direct continuous-to-discrete recovery. Experiments on 8B-scale backbones show that the geometry score predicts effective shallow insertion layers under a fixed bridge-training protocol and that hidden-state recovery improves over continuous diffusion baselines in a diagnostic comparison matching the diffusion/recovery training budget. These results suggest that hidden-state geometry helps identify where diffusion-based replacement is feasible inside pretrained language models.
Abstract（参考訳）: 連続拡散言語モデルが自己回帰変換器に遅れをきたす理由は、拡散が言語装飾やトークン回収に不適な空間に適用されるためである。本研究では,事前に学習した変圧器に拡散する場所を問う幾何学誘導型拡散変圧器ハイブリッドであるDiHALを提案する。 DiHALは、幾何学ベースのプロキシでレイヤをスコアし、拡散フレンドリーな隠れ状態インターフェースを選択し、上層と元のLMヘッドを保持しながら、下層のトランスフォーマープレフィックスを拡散ブリッジに置き換える。 DiHALはトークンではなく、選択された層に隠された状態を再構築することで、直接的に連続的に回復するのを避ける。 8Bスケールのバックボーン実験により, 固定ブリッジトレーニングプロトコルの下では, ジオメトリスコアが有効な浅い挿入層を予測し, 拡散・回復訓練予算に適合する診断比較において, 隠れ状態の回復は連続拡散ベースラインよりも改善することが示された。これらの結果は,事前訓練された言語モデル内で拡散に基づく置換が実現可能な場所を特定する上で,隠れ状態の幾何が有効であることを示唆している。

論文の概要: Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

関連論文リスト