Fugu-MT 論文翻訳(概要): RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

論文の概要: RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

arxiv url: http://arxiv.org/abs/2605.19328v1
Date: Tue, 19 May 2026 04:07:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.112807
Title: RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents
Title（参考訳）: RoboJailBench: ロボットエージェントの敵攻撃と防御のベンチマーク
Authors: Doguhuan Yeke, Yanming Zhou, Leo Y. Lin, Hongyu Cai, Antonio Bianchi, Z. Berkay Celik,
Abstract要約: エンボディドAIシステムの新しいクラスは、例えばロボットや自動運転車のような物理的なプラットフォームに統合される。従来の研究は、AIを具体化するためのジェイルブレイク攻撃と防御を導入した。既存のベンチマークと評価フレームワークは、従来のチャットベースのモデルをターゲットにするか、エンボディされたAIの非敵安全評価に注力する。本稿では,3つのコアコンポーネントからなるRoboJailBenchとのギャップに対処する。
参考スコア（独自算出の注目度）: 14.945227570112882
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in Vision-Language Models (VLMs) facilitate a new class of embodied AI systems, where these models are integrated into physical platforms, e.g. robots and autonomous vehicles, to interpret visual scenes and execute natural language commands in diverse environments. Previous research has introduced jailbreak attacks and defenses for embodied AI. Their evaluations, however, rely on ad-hoc datasets, limited metrics, and emphasize attack success while neglecting the trade-off between security and the ability to follow benign commands. Existing benchmarks and evaluation frameworks either target traditional chat-based models or focus on non-adversarial safety evaluation for embodied AI; neither captures the adversarial risks, inputs, consequences, and evaluation criteria necessary for jailbreak attacks in embodied AI systems. In this paper, we address this gap with RoboJailBench, which consists of three core components. We establish a security taxonomy derived from ISO standards, regulatory rules, and documented incidents. This effort yields 18 categories of security violation consequences for embodied AI. We introduce an intent contrast dataset pipeline that augments existing datasets with paired adversarial and benign goals to measure both security and utility. Lastly, we provide an evolving repository with standardized metrics and a unified process for assessing and integrating new attacks and defenses. With this benchmark, we construct a new taxonomy-balanced dataset and augment five existing datasets. We integrate four attacks and two defenses to evaluate their performance on leading embodied VLMs. This benchmark provides the first standardized evaluation framework for jailbreak attacks in embodied AI and supports future research. We release our code, datasets, and artifacts, and maintain a leaderboard at https://purseclab.github.io/benchmark-for-robotics-security.
Abstract（参考訳）: VLM(Vision-Language Models)の最近の進歩は、視覚的なシーンを解釈し、さまざまな環境で自然言語コマンドを実行するために、これらのモデルを物理的なプラットフォーム、例えばロボットや自動運転車に統合する、新しい種類の組込みAIシステムを促進する。従来の研究は、AIを具体化するためのジェイルブレイク攻撃と防御を導入した。しかし、彼らの評価は、アドホックなデータセット、限られたメトリクスに依存し、セキュリティと良心的なコマンドに従う能力の間のトレードオフを無視しながら、攻撃の成功を強調している。既存のベンチマークと評価フレームワークは、従来のチャットベースのモデルをターゲットにするか、エンボディAIの非敵安全評価に焦点を当てる。本稿では,3つのコアコンポーネントからなるRoboJailBenchとのギャップに対処する。私たちは、ISO標準、規制規則、および文書化されたインシデントから派生したセキュリティ分類を確立します。この取り組みは、実施中のAIに対して18のセキュリティ違反の結果をもたらす。私たちは、セキュリティとユーティリティの両方を測定するために、ペアと良心的な目標で既存のデータセットを拡張するインテントコントラストデータセットパイプラインを導入しました。最後に、標準化されたメトリクスと、新たなアタックとディフェンスの評価と統合のための統一されたプロセスを備えた、進化中のレポジトリを提供します。このベンチマークでは,新しい分類バランスデータセットを構築し,既存の5つのデータセットを拡張した。我々は4つの攻撃と2つの防御を統合して、先進的なVLMの性能を評価する。このベンチマークは、組み込みAIにおけるジェイルブレイク攻撃のための最初の標準化された評価フレームワークを提供し、将来の研究をサポートする。コード、データセット、アーティファクトをリリースし、https://purseclab.github.io/benchmark-for-robotics-security.orgでリーダボードを維持しています。

論文の概要: RoboJailBench: Benchmarking Adversarial Attacks and Defenses in Embodied Robotic Agents

関連論文リスト