Fugu-MT 論文翻訳(概要): $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

論文の概要: $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

arxiv url: http://arxiv.org/abs/2605.25893v1
Date: Mon, 25 May 2026 14:22:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:20.329043
Title: $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing
Title（参考訳）: D^2$-Monitor: Hesitation-Aware Routingによる拡散LDMの動的安全モニタリング
Authors: Aoxi Liu, Yupeng Chen, James Oldfield, Guanzhe Hong, Junchi Yu, Baoyuan Wu, Philip Torr, Adel Bibi,
Abstract要約: 大規模言語モデル(D-LLM)の安全性監視はほとんど未検討である。 D-LLMの2レベル安全モニタであるD2$-Monitorを提案する。 D2$-Monitorは、常にオンのモニターとして軽量プローブを採用して、ヒューズレーションを共同で見積もり、ベース分類を実行する。
参考スコア（独自算出の注目度）: 63.49501120848927
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyze which trajectory-level signals best indicate when such probes are likely to struggle. We find that the most informative signal is safety hesitation: intermediate hidden states repeatedly falling within a small margin of the probe's decision boundary. The number of such hesitation steps in D-LLM's trajectory predicts probe failure effectively, providing a proxy of sample difficulty. Building on this analysis, we propose $D^2$-Monitor, a bi-level safety monitor for D-LLMs. $D^2$-Monitor adopts a lightweight probe as an always-on monitor to jointly estimate hesitation and perform base classification. When the hesitation level exceeds a threshold, a more expressive but computationally heavier probe is activated. This dynamic routing mechanism allocates monitoring resources efficiently at test time. Evaluated on 3 datasets (WildguardMix, ToxicChat, OpenAI-Moderation) across 4 D-LLMs, $D^2$-Monitor achieves state-of-the-art performance with a compact parameter footprint ($\leq$ 0.85M parameters), and exhibits the best trade-off between effectiveness and efficiency relative to 8 baselines.
Abstract（参考訳）: 自己回帰型大言語モデル (AR-LLM) の代替として拡散型大言語モデル (D-LLMs) が出現したが、D-LLMs の安全性の監視はほとんど研究されていない。 AR-LLMとは異なり、D-LLMはマルチステップの復号化プロセスを通じてテキストを生成し、標準の単一ステップ監視設定では利用できない安全関連情報を含む中間的な隠れ表現を公開する。常時オン監視のための軽量プローブの適合性から,どの軌道レベルの信号が苦しむかを解析した。内部の隠れ状態は、探査機の決定境界の小さな辺りに繰り返し落ちる。 D-LLMの軌道におけるそのような発散ステップの数は、プローブの故障を効果的に予測し、サンプルの難易度を代用する。この分析に基づいて、D-LLMの2レベル安全モニタであるD^2$-Monitorを提案する。 D^2$-Monitorは、常にオンのモニターとして軽量プローブを採用して、ヒューズレーションを共同で見積もり、ベース分類を行う。吸湿レベルがしきい値を超えると、より表現力があるが計算的に重いプローブが活性化される。この動的ルーティングメカニズムは、テスト時にモニタリングリソースを効率的に割り当てる。 3つのデータセット(WildguardMix、ToxicChat、OpenAI-Moderation)を4つのD-LLMで評価すると、$D^2$-Monitorは、コンパクトなパラメータフットプリント($0.85Mパラメータ)で最先端のパフォーマンスを実現し、8つのベースラインに対する有効性と効率の最良のトレードオフを示す。

論文の概要: $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

関連論文リスト