Fugu-MT 論文翻訳(概要): If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

論文の概要: If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

arxiv url: http://arxiv.org/abs/2604.19844v1
Date: Tue, 21 Apr 2026 11:27:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-23 15:36:10.643566
Title: If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems
Title（参考訳）: もしあなたがサインを待っているなら...それは違うかもしれない!視覚-言語エージェントシステムへの視覚注入からの信頼境界の融合を緩和する
Authors: Jiamin Chang, Minhui Xue, Ruoxi Sun, Shuchao Pang, Salil S. Kanhere, Hammond Pearce,
Abstract要約: 環境信号は、エージェントの挙動に影響を与えるべきバンド内信号である。同様の信号は、誤解を招く視覚注射として動作させることもできる。現在のLVLMベースのエージェントは、このトレードオフを確実にバランスすることができない。意思決定から認識を分離する多エージェント防衛フレームワークを提案する。
参考スコア（独自算出の注目度）: 23.899383110296622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in embodied Vision-Language Agentic Systems (VLAS), powered by large vision-language models (LVLMs), enable AI systems to perceive and reason over real-world scenes. Within this context, environmental signals such as traffic lights are essential in-band signals that can and should influence agent behavior. However, similar signals could also be crafted to operate as misleading visual injections, overriding user intent and posing security risks. This duality creates a fundamental challenge: agents must respond to legitimate environmental cues while remaining robust to misleading ones. We refer to this tension as trust boundary confusion. To study this behavior, we design a dual-intent dataset and evaluation framework, through which we show that current LVLM-based agents fail to reliably balance this trade-off, either ignoring useful signals or following harmful ones. We systematically evaluate 7 LVLM agents across multiple embodied settings under both structure-based and noise-based visual injections. To address these vulnerabilities, we propose a multi-agent defense framework that separates perception from decision-making to dynamically assess the reliability of visual inputs. Our approach significantly reduces misleading behaviors while preserving correct responses and provides robustness guarantees under adversarial perturbations. The code of the evaluation framework and artifacts are made available at https://anonymous.4open.science/r/Visual-Prompt-Inject.
Abstract（参考訳）: 大規模視覚言語モデル(LVLM)を利用した視覚言語エージェントシステム(VLAS)の最近の進歩により、AIシステムは現実世界のシーンを知覚し、推論することができる。この文脈の中では、信号機のような環境信号は、エージェントの動作に影響を与え、影響を及ぼすために必要な帯域内信号である。しかし、同様の信号は、誤解を招くビジュアルインジェクション、ユーザの意図を覆い、セキュリティ上のリスクを生じさせるものとして動作させることもできる。この二重性は、基本的な課題を生み出します。エージェントは正当な環境基準に応答し、誤解を招くものに対して堅牢なままでいなければなりません。我々はこの緊張を信頼境界の混乱と呼ぶ。この振る舞いを研究するために、我々は、現在のLVLMベースのエージェントが、有用な信号を無視したり、有害な信号に従うことなく、確実にこのトレードオフのバランスをとることができないことを示すデュアルインテントデータセットと評価フレームワークを設計する。構造ベースおよびノイズベースビジュアルインジェクションを用いて,複数の実施環境において7つのLVLMエージェントを系統的に評価した。これらの脆弱性に対処するため,視覚入力の信頼性を動的に評価するために,認識と意思決定を分離するマルチエージェント・ディフェンス・フレームワークを提案する。提案手法は, 正しい応答を保ちつつ, 誤伝行動を大幅に低減し, 対向的摂動下での堅牢性を保証する。評価フレームワークとアーティファクトのコードはhttps://anonymous.4open.science/r/Visual-Prompt-Injectで公開されている。

論文の概要: If you're waiting for a sign... that might not be it! Mitigating Trust Boundary Confusion from Visual Injections on Vision-Language Agentic Systems

関連論文リスト