Fugu-MT 論文翻訳(概要): GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

論文の概要: GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

arxiv url: http://arxiv.org/abs/2605.07399v2
Date: Mon, 11 May 2026 06:29:59 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 19:24:01.351775
Title: GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization
Title（参考訳）: GPO-V:グローバル確率最適化によるジェイルブレイク拡散ビジョン言語モデル
Authors: Yu Pan, Andi Zhang, Yi Wang, Sibei Yang, Wenjie Wang,
Abstract要約: Diffusion Vision-Language Models (dVLMs) は、従来の自己回帰生成パラダイムから離れることで、マルチモーダルタスクにおいて顕著な効果を示した。この脆弱性を利用するために,マスク拡散モデルの認知軌道に特化して設計された一般的なジェイルブレイクパラダイムであるグローバル確率最適化(GPO)を提案する。 GPO-Vは,DVLM用に開発された最初の視覚的モダリティジェイルブレイクフレームワークである。
参考スコア（独自算出の注目度）: 38.17733373188058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion Vision-Language Models (dVLMs), built upon the non-causal foundations of Diffusion Large Language Models (dLLMs), have demonstrated remarkable efficacy in multimodal tasks by departing from the traditional autoregressive generation paradigm. While dVLMs appear inherently robust against conventional jailbreak tactics, which we categorize as Fixed Prefix Optimization (FPO) (e.g., anchoring responses with "Sure, here is"), this perceived resilience is deceptive. Our investigation into the safety landscape of dVLMs reveals a unique refusal pattern: Immediate Refusal and Progressive Refusal. We find that while FPO-based attacks often fail by triggering the latter, the progressive refinement process itself uncovers a novel, latent attack surface. To exploit this vulnerability, we propose Global Probability Optimization (GPO), a general jailbreak paradigm designed specifically for the denoising trajectory of masked diffusion models. Unlike prefix-based methods, GPO manipulates the global generative dynamics to bypass guardrails in diffusion language models. Building on this, we introduce GPO-V, the first visual-modality jailbreak framework tailored for dVLMs. Empirical results demonstrate that GPO-V produces stealthy perturbations with exceptional cross-model transferability, revealing a critical security gap in non-sequential generative architectures. Our findings underscore the critical urgency of addressing safety alignment in dVLMs. These results necessitate an immediate and fundamental re-evaluation of current defense paradigms to mitigate the unique risks of diffusion-based generation. Our code is available at: https://anonymous.4open.science/r/GPO-V-0250.
Abstract（参考訳）: Diffusion Vision-Language Models (dVLMs) は、拡散大言語モデル(dLLMs)の非因果的基礎の上に構築され、従来の自己回帰生成パラダイムから脱却することで、マルチモーダルタスクにおいて顕著な効果を示した。例えば、"Sure, here is"で応答をアンカーするなど、FPO(Fixed Prefix Optimization)に分類される従来のジェイルブレイク戦術に対して、dVLMは本質的に堅牢であるように見えるが、このレジリエンスは偽りである。 dVLMの安全性の展望を調査した結果,即時的拒絶と進歩的拒絶という,ユニークな拒絶パターンが明らかとなった。 FPOベースの攻撃は後者をトリガーすることでしばしば失敗するが、プログレッシブ・リファインメント・プロセス自体が新しく潜伏する攻撃面を明らかにする。この脆弱性を利用するために,マスク拡散モデルの認知軌道に特化して設計された一般的なジェイルブレイクパラダイムであるグローバル確率最適化(GPO)を提案する。プレフィックスベースの手法とは異なり、GPOは拡散言語モデルにおいてガードレールをバイパスするためにグローバルな生成ダイナミクスを操作する。 GPO-Vは,DVLM用に開発された最初の視覚的モダリティジェイルブレイクフレームワークである。実験結果から, GPO-Vは例外的なモデル間移動性を持つスチープな摂動を発生し, 非逐次生成アーキテクチャにおいて重要なセキュリティギャップが明らかとなった。本研究は,dVLMにおける安全性確保の緊急性を強調した。これらの結果は、拡散ベースの生成のユニークなリスクを軽減するために、現在の防衛パラダイムの即時かつ基本的な再評価を必要とする。私たちのコードは、https://anonymous.4open.science/r/GPO-V-0250で利用可能です。

論文の概要: GPO-V: Jailbreak Diffusion Vision Language Model by Global Probability Optimization

関連論文リスト