Fugu-MT 論文翻訳(概要): AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

論文の概要: AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

arxiv url: http://arxiv.org/abs/2510.04257v1
Date: Sun, 05 Oct 2025 15:46:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.545182
Title: AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents
Title（参考訳）: AgentTypo: ブラックボックスマルチモーダルエージェントに対する適応型タイポグラフィープロンプトインジェクション
Authors: Yanjie Li, Yiming Cao, Dong Wang, Bin Xiao,
Abstract要約: 我々は、最適化されたテキストをWebページイメージに埋め込むことで、適応型タイポグラフィーインジェクションを組み込むフレームワークであるAgentTypoを紹介する。我々のATPIアルゴリズムは,スチールスロスによる人体検出性を最小化しながらキャプタを置換することで,迅速な再構築を最大化する。我々はまた,複数LLMシステムであるAgentTypo-proを開発し,評価フィードバックを用いてインジェクションプロンプトを反復的に洗練し,連続学習における過去の事例を検索する。
参考スコア（独自算出の注目度）: 22.88469633141419
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multimodal agents built on large vision-language models (LVLMs) are increasingly deployed in open-world settings but remain highly vulnerable to prompt injection, especially through visual inputs. We introduce AgentTypo, a black-box red-teaming framework that mounts adaptive typographic prompt injection by embedding optimized text into webpage images. Our automatic typographic prompt injection (ATPI) algorithm maximizes prompt reconstruction by substituting captioners while minimizing human detectability via a stealth loss, with a Tree-structured Parzen Estimator guiding black-box optimization over text placement, size, and color. To further enhance attack strength, we develop AgentTypo-pro, a multi-LLM system that iteratively refines injection prompts using evaluation feedback and retrieves successful past examples for continual learning. Effective prompts are abstracted into generalizable strategies and stored in a strategy repository, enabling progressive knowledge accumulation and reuse in future attacks. Experiments on the VWA-Adv benchmark across Classifieds, Shopping, and Reddit scenarios show that AgentTypo significantly outperforms the latest image-based attacks such as AgentAttack. On GPT-4o agents, our image-only attack raises the success rate from 0.23 to 0.45, with consistent results across GPT-4V, GPT-4o-mini, Gemini 1.5 Pro, and Claude 3 Opus. In image+text settings, AgentTypo achieves 0.68 ASR, also outperforming the latest baselines. Our findings reveal that AgentTypo poses a practical and potent threat to multimodal agents and highlight the urgent need for effective defense.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)上に構築されたマルチモーダルエージェントは、オープンワールド設定にますますデプロイされているが、特に視覚入力を通じて、インジェクションのプロンプトに対して非常に脆弱である。我々は、最適化されたテキストをWebページイメージに埋め込み、適応型タイポグラフィーインジェクションをマウントする、ブラックボックスのレッドチームフレームワークであるAgentTypoを紹介する。我々のATPIアルゴリズムは,文字配置,サイズ,色に対するブラックボックス最適化を指導する木構造パーゼン推定器を用いて,キャプタを置換し,人間の検出可能性を最小限に抑えつつ,迅速な再構築を最大化する。攻撃強度をさらに高めるため,複数LLMシステムであるAgentTypo-proを開発した。効果的なプロンプトは一般化可能な戦略に抽象化され、ストラテジリポジトリに格納される。 Classifieds、Shopping、RedditシナリオにわたるVWA-Advベンチマークの実験では、AgentTypoがAgentAttackのような最新のイメージベースの攻撃よりも大幅に優れていることが示されている。 GPT-4oエージェントでは、画像のみの攻撃により、GPT-4V、GPT-4o-mini、Gemini 1.5 Pro、Claude 3 Opusにまたがる成功率が0.23から0.45に上昇する。 Image+text設定では、AgentTypoは0.68のASRを獲得し、最新のベースラインを上回っている。以上の結果から,AgentTypoはマルチモーダルエージェントに対して実用的で強力な脅威となり,効果的な防御の必要性が強調された。

論文の概要: AgentTypo: Adaptive Typographic Prompt Injection Attacks against Black-box Multimodal Agents

関連論文リスト