Fugu-MT 論文翻訳(概要): Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

論文の概要: Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

arxiv url: http://arxiv.org/abs/2604.05853v2
Date: Wed, 08 Apr 2026 04:16:14 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 14:06:05.167042
Title: Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models
Title（参考訳）: ピクセル間の読み上げ:テキストと画像のモデルに対するインクリメンタルなジェイルブレイク攻撃
Authors: Zonghao Ying, Haowen Dai, Lianyu Hu, Zonglei Jing, Quanchen Zou, Yaodong Yang, Aishan Liu, Xianglong Liu,
Abstract要約: 現代のテキスト・トゥ・イメージ(T2I)モデルでは、正当性のある段落長のテキストを描画できるようになった。我々は、敵がT2Iシステムを強制して有害なテキストペイロードを含む画像を生成する、記述的ジェイルブレイクを識別し、形式化する。敵のプロンプトを3つの機能層に分解するブラックボックス攻撃フレームワークであるEtchを提案する。
参考スコア（独自算出の注目度）: 31.243185346527255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern text-to-image (T2I) models can now render legible, paragraph-length text, enabling a fundamentally new class of misuse. We identify and formalize the inscriptive jailbreak, where an adversary coerces a T2I system into generating images containing harmful textual payloads (e.g., fraudulent documents) embedded within visually benign scenes. Unlike traditional depictive jailbreaks that elicit visually objectionable imagery, inscriptive attacks weaponize the text-rendering capability itself. Because existing jailbreak techniques are designed for coarse visual manipulation, they struggle to bypass multi-stage safety filters while maintaining character-level fidelity. To expose this vulnerability, we propose Etch, a black-box attack framework that decomposes the adversarial prompt into three functionally orthogonal layers: semantic camouflage, visual-spatial anchoring, and typographic encoding. This decomposition reduces joint optimization over the full prompt space to tractable sub-problems, which are iteratively refined through a zero-order loop. In this process, a vision-language model critiques each generated image, localizes failures to specific layers, and prescribes targeted revisions. Extensive evaluations across 7 models on the 2 benchmarks demonstrate that Etch achieves an average attack success rate of 65.57% (peaking at 91.00%), significantly outperforming existing baselines. Our results reveal a critical blind spot in current T2I safety alignments and underscore the urgent need for typography-aware defense multimodal mechanisms.
Abstract（参考訳）: 現代のテキスト・トゥ・イメージ(T2I)モデルでは、正当性のある段落長のテキストを描画できるようになった。我々は,敵対者がT2Iシステムを強制的に生成し,有害なテキストペイロード(例えば不正な文書)を視覚的な場面に埋め込んだ画像を生成する,記述的ジェイルブレイクを識別し,形式化する。視覚的に不快なイメージを誘発する伝統的な描写されたジェイルブレイクとは異なり、インクリプティブアタックはテキストレンダリング機能自体を武器にしている。既存のジェイルブレイク技術は粗い視覚操作のために設計されているため、キャラクタレベルの忠実さを維持しながら、多段安全フィルタをバイパスするのに苦労している。この脆弱性を明らかにするために,敵対的プロンプトを3つの機能的直交層(セマンティックカモフラージュ,視覚空間アンカー,タイポグラフィエンコーディング)に分解するブラックボックス攻撃フレームワークであるEtchを提案する。この分解により、全プロンプト空間上のジョイント最適化は、ゼロ次ループを通じて反復的に洗練されるトラクタブルなサブプロブレムに還元される。このプロセスでは、視覚言語モデルが生成された各画像を批判し、特定のレイヤに障害をローカライズし、ターゲットとするリビジョンを処方する。 2つのベンチマークで7つのモデルに対して大規模な評価を行った結果、Etchは65.57%(91.00%)の攻撃成功率を達成し、既存のベースラインを著しく上回る結果となった。以上の結果から,現在のT2I安全アライメントにおける重要な盲点が明らかとなり,タイポグラフィー対応型防衛マルチモーダル機構の緊急の必要性が浮き彫りになった。

論文の概要: Reading Between the Pixels: An Inscriptive Jailbreak Attack on Text-to-Image Models

関連論文リスト