Fugu-MT 論文翻訳(概要): REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

論文の概要: REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

arxiv url: http://arxiv.org/abs/2606.23892v1
Date: Mon, 22 Jun 2026 19:41:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 22:16:48.645603
Title: REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs
Title（参考訳）: REALM:物理世界VLMのための統一されたレッドチームベンチマーク
Authors: Yifei Zhao, Qian Lou, Mengxin Zheng,
Abstract要約: 視覚言語モデル(VLM)は、安全クリティカルシステムにおけるインテリジェンスを具現化するための知覚推論バックボーンとして、ますます使われている。 VLMの脆弱性を調査するために多くのレッドチーム手法が開発されているが、その評価はデータセット、メトリクス、脅威モデルで断片化されている。物理世界VLMのための最初の統一型赤チームベンチマークであるREALMを紹介した。
参考スコア（独自算出の注目度）: 18.815997579213317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-language models (VLMs) are increasingly used as perception-reasoning backbones for embodied intelligence in safety-critical physical systems, where perception or reasoning errors can lead to unsafe decisions or actions. Although many red-teaming methods have been developed to probe VLM vulnerabilities, their evaluation remains fragmented across datasets, metrics, and threat models, making direct comparison difficult and obscuring whether observed differences arise from stronger attacks, more vulnerable models, or incompatible evaluation settings. Existing chatbot-centric red-teaming benchmarks mainly standardize jailbreak and content-safety evaluation, but they do not systematically capture physically grounded functional failures or cover red-teaming methods that target physical-world VLMs. This raises the key challenge of comparing diverse attack methods under a unified protocol while targeting the same scenario-specific failures. We introduce REALM, to our knowledge the first unified red-teaming benchmark for physical-world VLMs. REALM integrates 12 red-teaming methods, 3 model-agnostic defenses, and 13 VLMs under a practical black-box threat model with shared datasets and metrics. To align adversarial objectives across attack families, REALM introduces an agentic target-generation pipeline that constructs shared, scenario-specific, and physically grounded attack objectives for each scene, enabling fair comparison of diverse red-teaming methods under aligned adversarial goals. Our evaluation shows that text and typographic injection attacks induce the most failures, multimodal co-optimization yields the strongest visual-perturbation transfer, single-pass attacks approach iterative methods at much lower cost, and model scale alone does not confer adversarial robustness. Code is available at https://github.com/UCF-ML-Research/REALM.
Abstract（参考訳）: 視覚言語モデル(VLM)は、安全クリティカルな物理システムにおいて、認識や推論エラーが安全でない決定や行動につながるような、知覚に影響を及ぼすバックボーンとして、ますます使われるようになっている。 VLMの脆弱性を調査するために多くのレッドチーム手法が開発されているが、その評価はデータセット、メトリクス、脅威モデルの間で断片化されており、観察された違いがより強力な攻撃、より脆弱なモデル、あるいは互換性のない評価設定から生じるかどうかを直接比較することは困難である。既存のチャットボット中心のレッドチームベンチマークは、主にジェイルブレイクとコンテンツセーフティの評価を標準化しているが、物理的に基盤付けられた機能障害を体系的に捉えたり、物理世界のVLMをターゲットにしたレッドチーム方式をカバーしていない。これにより、同一シナリオ固有の障害をターゲットとしながら、統一されたプロトコルの下で多様な攻撃方法を比較するという重要な課題が提起される。物理世界VLMのための最初の統一型赤チームベンチマークであるREALMを紹介した。 REALMは12のレッドチーム方法、3つのモデルに依存しないディフェンス、13のVLMを共通のデータセットとメトリクスを持つ実用的なブラックボックス脅威モデルの下で統合する。攻撃ファミリー間で敵の目標を整合させるため、REALMは、各シーンに対して共有、シナリオ固有、物理的に根拠付けられた攻撃目標を構成するエージェントターゲット生成パイプラインを導入し、アライメントされた敵の目標の下で多様な赤チームの方法の公正な比較を可能にする。評価の結果,テキストとタイポグラフィーによるインジェクション攻撃は最も失敗を招き,マルチモーダル・コプティマイゼーションは最強の視覚的摂動伝達,シングルパス・アタックはより低コストで反復的手法にアプローチし,モデルスケールだけでは敵の頑健さを損なわないことがわかった。コードはhttps://github.com/UCF-ML-Research/REALMで入手できる。

論文の概要: REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

関連論文リスト