Fugu-MT 論文翻訳(概要): STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

論文の概要: STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

arxiv url: http://arxiv.org/abs/2509.26473v1
Date: Tue, 30 Sep 2025 16:22:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-01 14:45:00.2
Title: STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models
Title（参考訳）: STaR-Attack:統合マルチモーダル理解・生成モデルのための時空間的・物語的推論攻撃フレームワーク
Authors: Shaoxiong Guo, Tianyi Du, Lijun Li, Yuyao Wu, Jie Li, Jing Shao,
Abstract要約: 統一マルチモーダル理解・生成モデル(UMM)における世代間結合から生じる脆弱性を同定する。セマンティックドリフトを使わずにUMMのユニークな安全性の弱点を利用する,初のマルチターンジェイルブレイク攻撃フレームワークSTaRAttackを提案する。
参考スコア（独自算出の注目度）: 26.85057724101928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unified Multimodal understanding and generation Models (UMMs) have demonstrated remarkable capabilities in both understanding and generation tasks. However, we identify a vulnerability arising from the generation-understanding coupling in UMMs. The attackers can use the generative function to craft an information-rich adversarial image and then leverage the understanding function to absorb it in a single pass, which we call Cross-Modal Generative Injection (CMGI). Current attack methods on malicious instructions are often limited to a single modality while also relying on prompt rewriting with semantic drift, leaving the unique vulnerabilities of UMMs unexplored. We propose STaR-Attack, the first multi-turn jailbreak attack framework that exploits unique safety weaknesses of UMMs without semantic drift. Specifically, our method defines a malicious event that is strongly correlated with the target query within a spatio-temporal context. Using the three-act narrative theory, STaR-Attack generates the pre-event and the post-event scenes while concealing the malicious event as the hidden climax. When executing the attack strategy, the opening two rounds exploit the UMM's generative ability to produce images for these scenes. Subsequently, an image-based question guessing and answering game is introduced by exploiting the understanding capability. STaR-Attack embeds the original malicious question among benign candidates, forcing the model to select and answer the most relevant one given the narrative context. Extensive experiments show that STaR-Attack consistently surpasses prior approaches, achieving up to 93.06% ASR on Gemini-2.0-Flash and surpasses the strongest prior baseline, FlipAttack. Our work uncovers a critical yet underdeveloped vulnerability and highlights the need for safety alignments in UMMs.
Abstract（参考訳）: 統一マルチモーダル理解と生成モデル(UMM)は、理解と生成の両方において顕著な能力を示した。しかし,UMMにおける世代間結合から生じる脆弱性を同定する。攻撃者は生成関数を用いて情報豊富な逆画像を作成し、その理解関数を利用して単一のパスに吸収し、それをクロスモーダル生成注入(CMGI)と呼ぶ。悪意のある命令に対する現在の攻撃方法は、しばしば単一のモダリティに制限されるが、セマンティックドリフトによる即時書き換えに依存しており、UMMのユニークな脆弱性は未発見のままである。セマンティックドリフトを使わずにUMMのユニークな安全性の弱点を利用する,初のマルチターンジェイルブレイク攻撃フレームワークSTaR-Attackを提案する。具体的には、時空間におけるターゲットクエリと強く相関する悪意のあるイベントを定義する。 3幕の物語理論を用いて、STaR-Attackは、悪質な出来事を隠されたクライマックスとして隠しながら、前景と後景を生成する。攻撃戦略を実行する際、オープニング2ラウンドは、これらのシーンのために画像を生成するUMMの生成能力を利用する。そして、その理解能力を利用して、画像に基づく疑問推測・解答ゲームを導入する。 STaR-Attackは、元の悪意のある質問を良心的な候補者に埋め込む。 STaR-Attackは、Gemini-2.0-Flash上で最大93.06%のASRを達成し、最強のベースラインであるFlipAttackを上回っている。我々の研究は、重要で未発達の脆弱性を明らかにし、UMMにおける安全アライメントの必要性を強調します。

論文の概要: STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models

関連論文リスト